专利摘要:
ADAPTIVE PROCESSING WITH SEVERAL MEDIA PROCESSING NODES. The present invention relates to techniques for adaptive processing of media data based on separate data specifying a state of the media data is provided. A device in a media processing chain can determine whether a type of media processing has already been performed on an input version of media data. In this case, the device can adapt its media data processing to disable perform media processing type. Otherwise, the device performs the media processing type. The device can create a media data state by specifying the media processing type. The device can communicate the state of media data and an output version of media data to a receiving device in the media, chain processing, for the purpose of adaptive processing of the receiving device supporting media data.
公开号:BR112013013353B1
申请号:R112013013353-8
申请日:2011-12-01
公开日:2021-05-04
发明作者:Jefrey Riedmiller;Regunathan Radhakrishnan;Marvin Pribadi;Farhad Farahani;Michael Smithers
申请人:Dolby Labora Tories Licensing Corporation;
IPC主号:
专利说明:

Cross Reference to Related Claim and Priority Claim
[0001] This Application claims priority to US Provisional Application number 61/419,747, filed December 3, 2010 and US Provisional Application number 61/558,286, filed November 10, 2011, both of which are incorporated herein by reference in their entirety, for all purposes. Technology
[0002] The present invention relates generally to media processing systems and in particular to adaptive media data processing based on media processing states of media data. Background
[0003] Media processing units typically operate in a blind manner, and do not take into account the media data processing history that occurs before the media data is received. This can work in a media processing framework in which a single entity does all the media processing and encoding for a variety of target media transform devices while a target media transform device does all the decoding and transforming the data. encoded media. However, this blind processing does not work well (or at all) in situations where a plurality of media processing units are dispersed across a diverse network or are placed in tandem (ie, chain) and are expected to optimally perform their respective types of media processing. For example, some media data may be encoded for high-performance media systems and may need to be converted to a reduced form suitable for a mobile device along a media processing chain. Consequently, a media processing unit can unnecessarily perform a type of processing on the media data that has already been performed. For example, a volume leveling unit performs processing on an input audio clip, regardless of whether or not volume leveling has previously been performed on the input audio clip. As a result, the volume leveling unit performs leveling even when it is not needed. This unnecessary processing can also cause degradation and/or removal of specific characteristics while transforming media content into media data.
[0004] The approaches described in this section are approaches that could be pursued, but not necessarily approaches that were previously conceived and pursued. Therefore, unless otherwise indicated, it should not be admitted that any of the approaches described in this section qualify as the preceding technique merely by virtue of their inclusion in this section. Similarly, issues identified with respect to one or more approaches should not admit to having been recognized in any preceding technique based on this section, unless otherwise indicated. Brief Description of Drawings
[0005] The present invention is illustrated by way of example and not by way of limitation, and in the accompanying drawing figures, and in which like reference numerals refer to similar elements, and in which:
[0006] Figure 1 illustrates an example of media processing chain, according to some possible embodiments of the present invention;
[0007] Figure 2 illustrates an example of an improved media processing chain, according to some possible embodiments of the present invention;
[0008] Figure 3 illustrates an example of encoder/transcoder according to some possible embodiments of the present invention;
[0009] Figure 4 illustrates an example of decoder, according to some possible embodiments of the present invention;
[00010] Figure 5 illustrates an example of post-processing unit, according to some possible embodiments of the present invention;
[00011] Figure 6 illustrates an example of implementation of an encoder/transcoder, according to some possible embodiments of the present invention;
[00012] Figure 7 illustrates an example of evolution of decoder operation control modes of a volume leveling unit based on the quality of the sound volume metadata in and/or associated processing state metadata, according to some possible embodiments of the present invention;
[00013] Figure 8 illustrates an example of configuration of using data hiding to pass the media processing information, according to some possible embodiments of the present invention;
[00014] Figure 9A and Figure 9B illustrate examples of process flows, according to some possible embodiments of the present invention;
[00015] Figure 10 illustrates an example of hardware platform on which a computer or other computing device as described here, can be implemented according to a possible embodiment of the present invention;
[00016] Figure 11 illustrates media frames with which processing state metadata associated with media data in media frames can be transmitted according to an example modality;
[00017] Figures 12A to Figure 12L illustrate block diagrams of some examples of media processing nodes / devices according to some embodiments of the present invention. Description of Examples of Possible Modalities
[00018] Examples of possible modalities that relate to adaptive processing of media data based on media processing states of media data are described here. In the description that follows, for purposes of explanation, a number of specific details are described to provide a complete understanding of the present invention. It will be apparent, however, that the present invention can be made practical without these specific details. In other cases, well-known structures and devices are not described in exhaustive detail to avoid unnecessarily hiding, obscuring, or obfuscating the present invention.
[00019] Examples of modalities are described here according to the following outline: 1. Overview 2. Media processing chains 3. Media processing devices or units 4. Example of adaptive media data processing 5. Hiding of data 6. Example process flow 7. Implementation mechanisms - hardware overview 8. Examples of numbered modalities 9. Extensions, equivalents, alternatives and miscellaneous Overview
[00020] This overview presents a basic description of some aspects of a possible embodiment of the present invention. It should be noted that this overview is not an extensive or exhaustive summary of possible aspects of the modality. Furthermore, it should be noted that this overview is not designed to be understood as identifying any particular significant aspects or elements of the possible modality, nor as outlining any scope of the possible modality in particular, nor the invention in general. This overview merely presents some concepts that relate to the possible modality example in a condensed and simplified format, and should be understood as merely a conceptual prelude to a more detailed description of the possible modality examples that follow below.
[00021] Techniques for adaptive processing for media data based on media processing states of media data are described. In some embodiments, possible media processing units in an enhanced media processing chain are automatically enabled to retrieve and validate media processing signaling and/or processing state metadata, determine media data state with based on the media processing signaling and/or processing state metadata, adapt its respective processing based on the state of the media data. Media processing units in the enhanced media processing chain may include, but are not limited to encoders, transcoders, decoders, pre-processing units, post-processing units, bitstream processing tools, codecs (ATSC ) Advanced Television Systems Committee. Codecs (MPEG) Moving Picture Experts Group, etc. A media processing unit can be a media processing system or a part of a media processing system.
[00022] As used herein, the term "processing state metadata" refers to data separate and different from media data as media data (eg video frames, perceptually encoded audio frames, or PCM audio samples that contain media content) be refer to media sample data that represents media content and is used to transform media content such as audio or video output. Processing state metadata is associated with the media data and specifies what types of processing have already been performed on the media data. This association of processing state metadata with media data is time synchronous. Thus, the present processing state metadata indicates that the present media data contemporaneously comprises the results of the indicated types of media processing and/or a description of media characteristics in the media data. In some possible embodiments, processing state metadata may include processing history and/or some or all of the parameters that are used in and/or derived from the indicated types of media processing. Additionally and/or optionally the processing state metadata may include media characteristics of one or more different types computed/extracted from the media data. Media characteristics, as described here, provide a semantic description of the media data and can comprise one or more of structural properties, key that includes harmony and melody, timbre, rhythm, reference volume, stereo mix, or an amount of media data sound sources, absence or presence of voice, repeating characteristics, melody, harmony, lyrics, timbre, perceptible characteristics, digital media characteristics, stereo parameters, voice recognition (eg what a speaker is saying , etc. Processing state metadata may also include other metadata that is not related to or derived from any processing of the media data. For example, third party partner data, tracking information, identifiers, proprietary or pattern information, data from user annotation, user preference data, etc., can be added by a particular media processing unit p to move to other media processing units. These independent types of metadata can be distributed to or from, validated and used by a media processing component in the media processing chain. The term "media processing signaling" refers to relatively light control or state data (which may be a small amount of data relative to that of processing state metadata) that is communicated between media processing units and in a stream of media bits. The media processing signal may comprise a subset or a summary of processing state metadata.
[00023] Media processing signaling and/or processing state metadata may be embedded in one or more reserved fields (for example, which may be, but are not limited to, not currently used) loaded into a substream in a stream of media bits, hidden with media data, or provided with a separate media processing database. In some possible embodiments the volume of media processing signaling data and/or processing state metadata may be small enough to be loaded (for example, into reserved or hidden fields in media samples using masking techniques. reversible data, or store detailed processing state information in an external database while computing media fingerprints from media data or retrieving media fingerprints from media data, etc.), without affecting bit rate allocated to load the media data. Communicating media processing signaling and/or processing state metadata in an enhanced media processing chain is particularly useful when two or more media processing units need to work in tandem with each other throughout the entire processing chain media (or content lifecycle). Without media processing signaling and/or processing state metadata, severe media processing problems such as quality, level and spatial degradations are likely to occur, for example, when two or more audio codecs are used in the chain it is leveling Single terminal volume is applied more than once during the media content journey to a media consuming device (or a transformation point of media content into media data).
[00024] In contrast, techniques here elevate the intelligence of any or all media processing units in an enhanced media processing chain (content lifecycle). Under the techniques here, any of these media processing units can simultaneously "listen and adapt" as well as "announce" the state of media data to downstream media processing units. Thus, under the techniques here, a downstream media processing unit can optimize its processing of the media data based on knowledge of past processing of the media data as performed by one or more upstream media processing units. Under the techniques here, media processing through the media processing chain as a whole over media data becomes more efficient, more adaptive and more predictable than otherwise. As a result, global transformation and manipulation of media content from media data is greatly improved.
[00025] Importantly, under the technique here, the presence of media data state as indicated by media processing signaling and/or processing state metadata does not negatively impact legacy media processing units that may be present in the enhanced media processing chain and may themselves not proactively use the state of the media data to adaptively process the media data. Furthermore, even if a legacy media processing unit in the media processing chain may have a tendency to spoof the processing results of other upstream media processing devices, the processing state metadata here may be passed in such a way. securely and securely for downstream media processing devices through secure communication methods that make use of cryptographic values, encryption, authentication and data hiding. Data hiding example includes both reversible and irreversible data hiding.
[00026] In some possible modalities, to transport a media data state to downstream media processing units, techniques here involve and/or embed one or more processing subunits in the form of software, hardware, or both, in a media processing unit, to enable the media processing unit to read and write and/or validate processing state metadata distributed with the media data.
[00027] In some possible modalities, a media processing unit (eg encoder, decoder, leveler, etc.) can receive data from media on which the one or more types of media processing has already been performed: 1) there is no processing state metadata to indicate these types of media processing previously performed, and/or 2) processing state metadata may be incorrect or incomplete. Types of media processing that have been performed previously include operations (eg, volume leveling) that can change media samples, as well as operations (eg, fingerprint extraction and/or feature extractions based on media samples) which may not alter media samples. The media processing unit can be configured to automatically create "correct" processing state metadata that reflects the "true" state of the media data and associate this state of the media data with the media data by communicating the state metadata of created for one or more downstream media processing units. Furthermore, the association of media data and processing state metadata can be performed in such a way that a resulting media bitstream is backward compatible with legacy media processing units such as legacy decoders. As a result, legacy decoders that do not implement the techniques here may still be able to decode the media data correctly as legacy decoders are designed to do while ignoring the associated processing state metadata that indicates the state of the media data. In some possible embodiments, the media processing unit here can be configured at the same time with an ability to validate processing state metadata with the media (source) data through debatable analysis and/or validation of one or more embedded random values (eg signatures).
[00028] Under techniques as described here, adaptive processing of media data based on a contemporary state of the media data as indicated by the received processing state metadata can be performed at various points in a media processing chain. For example, if sound volume metadata in the processing state metadata is valid, then a volume leveling unit subsequent to a decoder can be notified by the decoder with signaling media processing and/or processing state metadata, of so that the volume leveling unit can pass media data such as unchanged audio.
[00029] In some embodiments processing state metadata includes media characteristics extracted from underlying media samples. Media characteristics can provide a semantic description of the media samples and can be provided as a part of the processing state metadata to indicate, for example, whether the media samples comprise speech, music, whether someone is singing in silent conditions or with noise, if the singing is above a talking crowd, if a dialogue is taking place, if a speech is over a noisy background, a combination of two or more of the preceding ones, etc. Adaptive processing of media data can be performed at various points in a media processing chain based on the description of media characteristics contained in the processing state metadata.
[00030] Under techniques as described here, processing state metadata embedded in a media bitstream with media data can be authenticated and validated. For example, the techniques here can be useful for volume regulators and for checking that the volume of a particular program is already within a specified range and that the media data itself has not been modified (thereby ensuring compliance with regulations). A volume value included in a data block comprising processing state metadata can be read to verify this, rather than re-computing the volume.
[00031] Under techniques as described here, a data block comprising processing state metadata may include additional reserved bits to securely load third-party partner metadata. This feature can be used to enable a variety of applications. For example, a rating agency (eg, Nielsen Media Research) may choose to include a content identification tag that can then be used to identify a particular program being watched or listened to, for the purpose of computing observational statistical ratings. or to listen.
[00032] Significantly, techniques described here and variations of the techniques described here can ensure that processing state metadata associated with media data is preserved throughout the entire media processing chain from content creation to content consumption .
[00033] In some possible embodiments, mechanisms as described here form part of a media processing system, which includes, but is not limited to, a handheld device, game machine, television, laptop computer, netbook computer, radio-cell phone, electronic book reader, point of sale terminal, desktop computer, computer workstation, computer kiosk, and various other types of terminals and media processing units.
[00034] Various modifications to the preferred embodiments and to the generic principles of features described herein will be readily apparent to those skilled in the art. Thus, the description is not intended to be limited to the modalities shown, but must conform to a broader scope consistent with the principles and characteristics described here. 2. Media Processing Chains.
[00035] Figure 1 illustrates an example of media processing chain according to some possible embodiments of the present invention. The media processing chain can, but is not limited to, comprise encoders, decoders, pre/post-processing units, transcoders and signal analysis and metadata correction units. These units in the media processing chain can be comprised in the same system or in different systems. In modalities where the media processing chain develops through several different systems, these systems can be located together or distributed geographically.
[00036] In some possible embodiments, a pre-processing unit of figure 1 can accept PCM (time domain) samples comprising media content as input and output processed PCM samples. An encoder can accept PCM samples as input and output an encoded (eg compressed) media bitstream of the media content.
[00037] As used herein, data (for example, loaded into a main stream of the bitstream) comprising media content is referred to as media data, as separate data from media data indicating types of processing performed on media data at any given point in the media processing chain is referred to as processing state metadata.
[00038] A Signal and Metadata Analysis correction unit can accept one or more encoded media bitstreams as input and validate that the processing state metadata included in the encoded media bitstreams is correct by performing signal analysis . If the Signal and Metadata Analysis correction unit finds that the included metadata is invalid, the Signal and Metadata Analysis correction unit replaces the incorrect value with the correct value obtained from the signal analysis.
[00039] A transcoder can accept media bitstreams as input and output a modified media bitstream. A decoder can accept compressed media bitstreams as input and output a decoded PCM sample stream. A post-processing unit can accept a stream of decoded PCM samples, perform any post-processing on it such as volume leveling of the media content and transform the media content into the decoded PCM samples into one or more speakers and/or display panels. All media processing units may not be able to adapt their processing to apply to media data using processing state metadata.
[00040] Techniques as provided here, provide an improved media processing chain in which media processing units such as encoders, decoders, transcoders, pre- and post-processing units, etc., adapt their respective processing to be applied about the media data according to a contemporary state of the media data as indicated by media processing signaling and/or processing state metadata respectively received by these media processing units.
[00041] Figure 2 illustrates an example of an improved media processing chain comprising encoders, decoders, pre- and post-processing units, transcoders, signal analysis correction units and metadata, according to some possible modalities of present invention. To adapt media data processing based on media data state some or all of the units in figure 2 can be modified. In some possible embodiments, each of the media processing units in the enhanced media processing chain example is configured to work cooperatively in performing non-redundant media processing and avoid unnecessary and erroneous repetition of processing that has been performed by units upstream. In some possible embodiments the state of media data at any point in the enhanced media processing chain from content creation to content consumption is understood by a current media processing unit at that point in the enhanced media processing chain. 3. Devices or Media Processing Units
[00042] Figure 3 illustrates a (modified) example of encoder/transcoder according to some possible embodiments of the present invention. Unlike the encoders of Fig. 1, the encoder/transcoder of Fig. 3 can be configured to receive processing state metadata associated with input media data is to determine precedent (pre/post) processing performed by one or more relative upstream units to the encoder/transcoder over input media data (eg, input audio) that the modified encoder/transcoder logically received from an upstream unit (eg, the last upstream unit that performed its processing on the input audio).
[00043] As used herein, the term "logically receives" may mean that an intermediate unit may or may not be involved in communicating the input media data from an upstream unit (eg the last upstream unit ) to a receiver unit, such as the encoder/transcoder unit of the present example.
[00044] In an example, the upstream unit that performed pre-/post-processing on the input media data may be in a different system than the system in which the receiving unit is a part. Input media data can be a stream of media bits output by the upstream unit and communicated through an intermediate transmission unit such as a network connection, a USB, a wide area network connection, a wireless connection , an optical connection, etc.
[00045] In another example, the upstream unit that performed pre-post processing on the input media data may be in the same system in which the receiving unit is a part. Input media data can be output via the upstream unit and communicated via an internal connection and via one or more internal units of the system. For example, data can be physically delivered via an internal bus, a crossbar connection, a serial connection, etc. In either case, under techniques here, the receiving unit can logically receive the input media data from the upstream unit.
[00046] In some possible embodiments, the encoder/transcoder is configured to create or modify processing state metadata associated with the media data which may be a revision of the input media data. New modified processing state metadata created or modified by the encoder/transcoder can automatically and accurately capture the state of the media data that must be output through the encoder/transcoder, and further along the media processing chain . For example, processing state metadata may include whether or not certain processing (eg, Dolby Volume, Upmixing, commercially available from Dolby Laboratories) has been performed on the media data. Additionally and/or optionally, the processing state metadata may include parameters used in and/or derived from certain processing or any constituent operations in processing. Additionally and/or optionally, the processing state metadata may include one or more digitals computed/extracted from the media data. Additionally and/or optionally, the processing state metadata may include media characteristics of one or more different types computed/extracted from the media data. Media characteristics as described here provide a semantic description of the media data and may comprise one or more structural properties, tonality that includes harmony and melody, timbre, rhythm, reference volume, stereo mix, or a number of audio sources. media data, absence or presence of voice, repeating characteristics, melody, harmonies, lyrics, timbre, perceptual characteristics, digital media characteristics, stereo parameters, voice recognition (eg what a speaker (speaker) is saying), etc. In some embodiments extracted media characteristics are used to classify underlying media data for one or more of a plurality of media data classes. One or more media data classes may include, but are not limited to, any of a single global/dominant "class" (eg, a class type) for every piece of media and/or a single class that represents a smaller time period (eg a class subtitle for a subset/subrange of the whole chunk) such as a single media frame or media data block, multiple media frames, multiple media data blocks , a fraction of a second, a second, several seconds, etc. For example, a class label can be computed and inserted into the bitstream and/or hidden through reversible or irreversible data-hiding techniques every 32 milliseconds into the datastream. A class label can be used to indicate one or more class types and/or one or more class subtypes. In a media data frame the class label can be inserted into a metadata structure that precedes, or alternatively follows, a block of media data with which the class label is associated, as illustrated in Figure 11. Media classes may include, but are not limited to, any one of unique class types such as music, speech, noise, silence, applause. The media processing device as described herein can also be configured to classify media data comprising mixtures of media class types such as talk about music, etc. Additionally, alternatively and/or optionally, the media processing device as described herein may be configured to load a probability/probability independent value for a media class type or subtype indicated by a computed media class label. One or more such possibility/probability values can be passed with the media class label in the same metadata structure. A probability/probability value indicates the level of "confidence" that the computed media class label has with respect to the segment/block of media for which a media class type or subtype is indicated by the computed media class label . The one or more possibility/probability values in combination with the associated media class label can be used by a receiving media processing device to adapt media processing in a way to improve any one in a wide variety of operations across a whole chain of media processing such as upmixing, encoding, decoding, transcoding, headphone virtualization, etc. Processing state metadata may include, but are not limited to, any of media class types and subtitles or possibility/probability values. Additionally, optionally or alternatively, rather than passing media class types/subtypes and possibility/probability values in a metadata structure inserted between media (audio) data blocks, some or all media class types/subtypes and possibility/probability values can be embedded and passed to a receiving media processing node/device in media data (or samples) as hidden metadata. In some embodiments the content analysis results of media data included in processing state metadata may comprise one or more indications as to whether certain user-defined or system-defined keyword is spoken in any time segment of the media data . One or more applications may use such indications to trigger performance of related operations (for example, display contextual ads for products and services related to the keywords).
[00047] In some embodiments, while processing the media data with a first processor, a device as described here can operate a second processor in parallel to classify/extract media characteristics from the media data. Media characteristic can be extracted from a segment that lasts for a period of time (one frame, several frames, one second, several seconds, one minute, several minutes, a user defined period of time, etc.), or alternatively for a scene (based on detectable characteristic signal changes). Media characteristics as described by the processing state metadata can be utilized throughout the entire media processing chain. A downstream device can adapt its own media processing of media data based on one or more of the media characteristics. Alternatively, a downstream device may choose to ignore the presence of any or all media characteristics as described in the processing state metadata.
[00048] An application on a device in the media processing chain can leverage media characteristics in one or more of a variety of ways. For example, such an application can index the underlying media data using media characteristics. For a user who may want to go to sections where judges are talking about performance, the application may skip other preceding sections. Media characteristics as described in the processing state metadata provide contextual information of devices downstream of the media data as an intrinsic part of the media data.
[00049] More than one device in the media processing chain can perform analysis to extract media characteristics from media data content. This allows downstream devices not to have to analyze the content of the media data.
[00050] In some possible modality, the generated or modified processing state metadata can be transmitted as a part of a media bitstream (eg audio bitstream with metadata about the audio state) and arrive at a transmission speed on the order of 3 - 10 kbps. In some embodiments processing state metadata can be transmitted within media data (eg PCM media samples) on the basis of data hiding. A wide variety of data hiding techniques, which can alter the reversibility or irreversibility of media data, can be used to hide some or all of the processing state metadata (which includes, but is not limited to, related data only. authentication) of the media samples. Data hiding can be implemented with perceptible or inconspicuous secure communication channel. Data hiding can be accomplished by changing/manipulating/modulating signal characteristics (phase and/or amplitude in a frequency or time domain) of a signal in the underlying media samples. Data hiding can be implemented based on FSK, spread spectrum, or other available methods.
[00051] In some possible embodiments, a pre-/post-processing unit can perform processing of the media data in a cooperative manner with the encoder/transcoder. The processing performed by the cooperative pre-/post-processing unit is also specified in the processing state metadata which is communicated, for example, via the audio bitstream to a downstream media processing unit.
[00052] In some possible embodiments, once a piece of processing state metadata (which may include media fingerprints and any parameters used in or derived from one or more types of media processing) is derived, this piece of metadata Processing state can be preserved through the media processing units in the media processing chain and communicated to all downstream units. Thus, in some possible embodiments, a piece of processing state metadata can be created by means of the first media processing unit and passed to the last media processing unit as data embedded within a bitstream/substream of or as data derivable from an external data source or media processing database in the media processing chain (whole lifecycle).
[00053] Figure 4 illustrates an example decoder (for example, an evolution decoder that implements techniques here according to some possible embodiments of the present invention. A decoder in possible embodiments of the present invention can be configured (1) to slice and validate the processing state metadata (eg a processing history, a description of media characteristics, etc.) associated with the input media data and other metadata (eg independent of any processing of the media data such such as third party data, tracking information, identifiers, proprietary or pattern information, user annotation data, user preference data etc.) that have been passed in, and (2) to determine based on state metadata Processing Processes, the media processing state of the media data. For example, slicing and validating the process state metadata ing into a media bitstream (eg audio bitstream with audio state metadata) that carries the input media data and processing state metadata, decoder can determine that the volume metadata of sound (or media characteristic metadata) are valid and reliable and have been created through one of the enhanced content provider subunits that implement the techniques described here (eg, Dolby Media Generator (DGM) commercially available from Dolby Laboratories). In some possible embodiments, in response to determining that the received processing state metadata is valid and reliable, the decoder can be configured to then generate, based at least in part on the received processing state metadata, processing signaling information regarding the state of the media data using a reversible or irreversible data concealment technique. The decoder can be configured to provide media processing signaling to a downstream media processing unit (eg a post-processing unit) in the media processing chain. This type of signaling can be used, for example, when there is no dedicated (and synchronous) metadata path between the decoder and the downstream media processing unit. This situation can arise in some possible modalities where the decoder and downstream media processing unit exist as separate entities in a consumer electronic device (eg PCs, mobile phones, set-tops, audio and video recorders , etc.) or in different subsystems or different systems, in which synchronous control or data path between the decoder and the subsequent processing unit is not available. In some possible embodiments the media processing signaling under the data masking technique here can be transmitted as a part of a media bitstream and mounts at a transmission rate of the order of 16 bps. A wide variety of data hiding techniques that can alter the reversibility or irreversibility of media data can be used to populate a portion, or all, of the processing state metadata in media samples, including, but not limited to, any of perceptible or imperceptible secure communication channels, narrowband alterations/manipulations/modulations or spread spectrum signal characteristics (phase and/or amplitude in a frequency or time domain) of one or more signals in the underlying media samples, or other available methods.
[00054] In some possible embodiments the decoder may not attempt to pass over all received processing state metadata; rather, the decoder can only embed enough information (for example, within the limits of data masking capability) to change the mode of operation of the downstream media processing unit based on the state of the media data.
[00055] In some possible embodiments, redundancy in audio or video signal in media data can be exploited to load media data state. In some possible embodiments, without causing any audible or visible artifacts, something or all of the media processing signaling and/or processing state metadata can be hidden in the least significant bits (LSBs) of a plurality of bytes in the data. media or hidden in a secure communication channel loaded within the media data. The plurality of bytes can be selected based on one or more factors or criteria including whether LSBs can cause noticeable or audible artifacts when media samples with hidden data are transformed by a legacy media processing unit. Other data concealment techniques (eg, perceptible or inconspicuous secure communication channels, FSK-based data concealment techniques, etc.) that may alter the reversibility or irreversibility of media data may be used to conceal a portion or all of the processing state metadata in the media samples.
[00056] In some possible embodiments, data concealment technology may be optional and may not be necessary, for example if the downstream media processing unit is implemented as a part of the decoder. For example, two or more media processing units can share a bus and other communication mechanisms that allow metadata to be passed as out-of-band signals without hiding data in media samples from one media processing unit to another.
[00057] Figure 5 illustrates an example of post-processing unit (for example, a Dolby evolution post-processing unit) according to some possible embodiments of the present invention. A post-processing unit can be configured to first extract hidden media processing signaling in the media data (eg PCM audio samples with embedded information) to determine the state of the media data as indicated by the processing signaling. media. This can be done, for example, with an adjunct processing unit (eg an information extraction and audio restoration sub-unit in some possible modalities in which the media data comprises audio). In embodiments where media processing signaling is concealed using a reversible data concealment technique, foregoing modifications made to the media data by means of the data concealment technique (e.g., the decoder) to embed the processing signaling media, can be undone. In modalities where the media processing signaling is concealed using an irreversible data concealment technique, preceding modifications made to the media data by means of the data concealment technique (e.g., the decoder) to embed the processing signaling media may not be completely undone, but instead side effects on transforming media quality can be minimized (eg minimal audio and visual artifacts). Then, based on the state of the media data as indicated by the media processing signaling, the post-processing unit can be configured to adapt its processing to be applied over the media data. In one example, volume processing might be turned off in response to a determination (from the media processing signal) that the sound volume metadata was valid and that volume processing was performed via an upstream unit. In another example, an ad or contextual message may be presented or triggered by a voice-recognized keyword.
[00058] In some possible embodiments, a signal analysis and metadata correction unit in a media processing system described here can be configured to accept encoded media bitstreams as input and validate whether the metadata embedded in a media bitstream are correct to perform signal analysis. After validating that the embedded metadata is or is not valid within the media bitstream, correction can be applied on an as-needed basis. In some possible embodiments the signal analysis and metadata correction unit can be configured to perform analyzes on media data or encoded samples of the input media bitstreams in time and/or frequency domain to determine media characteristics of the media data. After determining media characteristics for corresponding processing state data, for example, a description of one or more media characteristics can be generated and provided to downstream devices in relation to the signal analysis and metadata correction unit. In some possible embodiments, the signal analysis and metadata correction unit can be integrated with one or more other media processing units in one or more media processing systems. Additionally and/or optionally, the signal analysis and metadata correction unit can be configured to hide media processing signaling in the media data and signal to a downstream unit (encoder/transcoder/decoder) that the data's embedded metadata are valid and have been successfully verified. In some possible embodiments the signaling data and/or processing state metadata associated with the media data can be generated and inserted into a compressed media bitstream that carries the media data.
[00059] Therefore, techniques as described here ensure that different processing blocks or media processing units in an enhanced media processing chain (eg encoders, transcoders, decoders, pre-post-processing units, etc.) , are able to determine the state of the media data. Hence, each of the media processing units can adapt its processing according to the state of the media data, as indicated by the upstream units. In addition, one or more data concealment techniques, reversible or irreversible, can be used to ensure that signal information regarding the state of the media data can be provided to downstream media processing units in an efficient and efficient manner. the minimum amount of bit rate required to transmit the signal information to the downstream media processing units. This is especially useful where there is no metadata path between an upstream unit such as a decoder and a downstream unit such as a post-processing unit, for example, where the post-processing unit is not part of the decoder.
[00060] In some possible embodiments an encoder can be enhanced or can comprise a metadata pre-processing and validation sub-unit. In some possible embodiments the metadata pre-processing and validation sub-unit can be configured to ensure that the encoder performs adaptive processing of the media data based on the state of the media data as indicated by the media processing and/or metadata signaling. processing state. In some possible embodiments, through the metadata pre-processing and validation sub-unit the encoder can be configured to validate the processing state metadata associated with (for example, included in a media bitstream with) the media data. For example, if metadata is validated to be trustworthy, then results from a type of media processing performed can be reused and re-performance of the media type processing can be avoided. On the other hand, if metadata is found to be false, then the type of media processing previously performed on purpose may be repeated by the encoder. In some possible embodiments, additional types of media processing can be performed by the encoder on the metadata, since processing state metadata (including media processing signaling and digital-based metadata retrieval) are found to be unreliable.
[00061] If processing state metadata is determined to be valid (for example, based on a match with an extracted cryptographic value and a reference cryptographic value) the encoder can also be configured to signal to other processing units downstream media in an enhanced media processing chain that processing state metadata, for example, present in the media bitstream is valid. Any, some, or all of a variety of approaches can be implemented by the encoder.
[00062] Under a first approach, the encoder can insert a flag into an encoded media bitstream (eg an "evolution flag") to indicate that the validation of the processing state metadata has already been performed on this bitstream. bits of encoded media. The flag can be inserted in such a way that the presence of the flag does not affect a legacy media processing unit, such as a decoder that is not configured to process and make use of processing state metadata as described here. In an example modality, an Audio Compression-3 (AC-3) encoder can be enhanced with a metadata pre-processing and validation subunit to set an "evolution flag" in the xbsi2 fields of an AC media bitstream -3 as specified in ATSC specifications (eg ATSC A/52b). This bit may be present in every encoded frame loaded into the AC-3 media bitstream and may not be used. In some possible embodiments the presence of this flag in the xbsi2 field two does not affect already-developed legacy decoders that are not configured to process and make use of processing state metadata as described here.
[00063] Under the first approach, there may be an issue with authenticating information in xbsi2 fields. For example, an upstream unit (eg, malicious) a may be able to turn the xbsi2 field ON without actually validating the processing state metadata and may incorrectly signal to other downstream units that the processing state metadata are valid.
[00064] To solve this question, some embodiments of the present invention can use a second approach. A secure data concealment method, which includes, but is not limited to any of a number of data concealment methods to create a secure communication channel within the media data itself, such as spread spectrum based methods, based methods in FSK, and other methods based on a secure communication channel, etc., can be used to embed the "evolution flag". This secure method is configured to prevent the evolution flag from being passed in plain text and thus easily attacked by a unit or an intruder intentionally or unintentionally. Instead, under this second approach, a downstream unit can retrieve the hidden data in an encrypted form. Through a subprocess of decryption and authentication, the downstream unit can verify the correctness of the hidden data and trust the "evolution flag ' in the hidden area. As a result, the downstream unit can determine that the processing state metadata of the media bitstream has been previously successfully validated. In various embodiments any portion of the processing state metadata such as "flag evolution" can be distributed by upstream device to downstream devices in any one of one or more cryptographic methods (HMAC based and not HMAC based).
[00065] In some possible embodiments, media data initially may simply be legacy media bitstreams, eg comprising PCM samples. However, since the media data is processed by one of the media processing units as described here, the processing state metadata generated by the one or more media processing units comprises the media data state as well as relatively information. detailed (which includes, but is not limited to, any of the one or more media characteristics determined from the media data) that may be used to decode the media data. In some possible embodiments, the generated processing state metadata may include media fingerprints such as video fingerprints, sound volume metadata, dynamic range metadata, one or more random value-based message authentication codes (HMACs), one or more dialog channels, digital audio, numbered processing history, audio sound volume, dialog sound volume, true peak values, sample peak values, and/or any user specified metadata (third partner ). Processing state metadata can comprise an "evolution data block".
[00066] As used herein, the term "enhanced" refers to an ability for a media processing unit under techniques described herein to work in such a way with other media processing units or other media processing systems under techniques described here that can perform adaptive processing based on the state of media data as established by upstream units. The term "evolution" refers to a capability for media processing units under techniques described herein to work in a manner compatible with legacy media processing units or legacy media processing systems, as well as a capability for the media processing units. media processing under the techniques herein, work in such a way with other media processing units or other media processing systems under the techniques described herein, that they can perform adaptive processing based on the state of media data as described by the units. upstream.
[00067] In some possible embodiments, a media processing unit described here may receive media data on which one or more types of media processing has been performed, but there may be no metadata or insufficient metadata associated with the media data to indicate the one or more types of media processing. In some possible embodiments, such media processing unit can be configured to create processing state metadata to indicate the one or more types of media processing that have been performed by other units upstream with respect to the media processing unit. Feature extraction that has not been done by upstream devices can also be performed and forwarded processing state metadata to downstream devices. In some possible embodiments of the media processing unit, for example, an evolution encoder/transcoder may comprise a media forensic analysis sub-unit. A media forensic subunit such as an audio forensic subunit can be configured to determine (without any metadata received) whether a certain type of processing has been performed on a piece of media content or on the media data. The analysis subunit can be configured to look for specific signal processing artifacts/traits introduced and left by the certain type of processing. The Media Forensics subunit can also be configured to determine whether a certain type of feature extraction has been performed on a piece of media content or on the media data. The analysis subunit can be configured to look for specific presence of metadata based on characteristic. For the purpose of the present invention, the media forensic analysis sub-unit as described herein can be implemented by means of any media processing unit in a media processing chain. In addition, processing state metadata created by a media processing unit through the media forensics sub-unit can be delivered to a unit downstream of the media processing chain here.
[00068] In some possible embodiments, processing state metadata as described here may include additional reserved bytes to support third-party partner applications. The additional reserved bytes can be ensured to be secure by allocating a separate encryption key to scramble any plain text to be loaded into one or more fields in the reserved bytes. Embodiments of the present invention support innovative applications that include content identification and tracking. In one example, media with Nielsen ratings might carry a unique identifier for the program in a (media) media bitstream. Nielsen ratings can then use this unique identifier to compute observation or listening statistics for the program. In another example, the bytes reserved here can carry keywords for search engines such as Google. Google can then match ads based on keywords included in one or more fields in the reserved bytes that carry keywords. For the purpose of the present invention, in applications as discussed herein, techniques herein can be used to ensure that the inherited bytes are secure and not decrypted by anyone other than the third party who is designated to use one or more fields in the reserved bytes.
[00069] Processing state metadata as described here can be associated with media data in any of a number of different ways. In some possible embodiments processing state metadata can be inserted into the output compressed media bitstream, which carries the media data. In some embodiments metadata is inserted in such a way as to maintain backward compatibility with legacy decoders that are not configured to perform adaptive processing based on the processing state metadata here. 4. Sample Adaptive Media Data Processing.
[00070] Figure 6 illustrates an example of implementation of an encoder/transcoder according to some possible embodiments of the present invention. Any of the outlined components can be implemented as one or more processes and/or one or more IC circuits (which include ASICs, FPGAs, etc.), in hardware, software, or a combination of hardware and software. The encoder/transcoder may comprise a number of inherited subunits such as an initial interface decoder (FED), a sub-task program decoder (full mode) that does not choose to perform normal dynamic range control/dialogue (DRC/dialnorm) processing ) based on whether such processing has already been done, a DRC generator (Gen DRC), a secondary task program encoder (BEE), a compressor (stuffer), a CRC regeneration unit, etc. With these inherited subunits the encoder/transcoder could be able to convert a bit stream (which, for example, can be, but is not limited to AC-3) to another bit stream comprising results of one or more types of processing media (which eg can be but is not limited to E AC-3 with adaptive and automated sound volume processing) However, media processing (eg sound volume processing) can be performed regardless of whether sound volume processing has been performed previously and/or whether media data of the input bitstream comprises the result of such preceding sound volume processing and/or whether processing state metadata is in the input bits. Thus, an encoder/transcoder with the inherited subunits could only perform erroneous or unnecessary media processing.
[00071] Under the techniques described here, in some possible embodiments as shown in Figure 6, the encoder/transcoder may comprise any of a plurality of new sub-units such as media data slicer/validator (which, for example, may be, however it is not limited to an AC-3 validator and slicer flag), adjunct media processing (eg dynamic range controller and real-time sound volume in adaptive transform domain, signal analysis, feature extraction, etc.). ), media digital generation (eg audio digital generation), metadata generator (eg evolution data generator and/or other metadata generator) insertion of media processing signaling (eg insertion "add_bsi" or insert for auxiliary data fields), HMAC generator (which can digitally sign one or more, even all frames to prevent spoofing by malicious or inherited entities), one or more of others the types of cryptographic processing units, one or more switches that operate on the basis of processing state signaling and/or processing state metadata (e.g. "state" squelch flag received from a flag slicer and validator , or flags for media features, etc. In addition, user input (eg target sound volume/user dianorm) and/or other input (eg from a digital video process) and/or other metadata input (eg a or more types of third party data, tracking information, identifiers, proprietary and/or pattern information, user annotation data, user preference data, etc.) may be received by the encoder/transcoder. As illustrated in the measured dialog, locked and unblocked volume and dynamic range values can also be entered into the evolution data generator. Other information related to the media characteristic can also be injected into a processing unit as described here to generate a portion of processing state metadata.
[00072] In one or more of a few possible embodiments, processing state metadata as described here, is loaded into the "add_bsi" fields specified in the (enhanced) Enhanced AC-3 (EAC-3) syntax as per ATSC A/52b, or in one or more auxiliary data fields in a media bitstream as described here. In some possible embodiments loading processing state metadata into these fields does not adversely impact the compressed media stream frame size and/or bit rate.
[00073] In some possible embodiments, processing state metadata may be included in a dependent or independent substream associated with a main program media bitstream. The advantage of this approach is that the bitrate allocated to encode media data carried by the main program media bitstream is not affected. If the processing state metadata is loaded as a part of encoded frames, then the bits allocated to encode audio information in order be reduced so that the compressed media stream's frame size and/or bit rate can be unchanged. For example, processing state metadata may comprise a reduced data rate representation and assume a low data rate of the order of 10 kbps to transmit between media processing units. Hence media data such as audio samples can be encoded at a slower speed by 10 kbps to accommodate processing state metadata.
[00074] In some possible embodiments, at least a portion of processing state metadata can be embedded with media data (or samples) through reversible or irreversible data hiding techniques. The advantage of this approach is that media samples and metadata can be received by downstream devices in the same bitstream.
[00075] In some possible embodiments, processing state metadata can be stored in an articulated and digital media processing database. A media processing unit downstream of an upstream unit such as an encoder/decoder that creates the processing state metadata can create a fingerprint from received media data and then use the fingerprint as a key to query the bank. of media processing data. Once the processing state metadata in the database is located, a data block comprising the processing state metadata associated with (or for) the media data received can be retrieved from the media processing database and made available to the downstream media processing unit. As used herein "digitals" may include, but is not limited to, any of one or more media fingerprints generated to indicate media characteristics.
[00076] In some possible embodiments, a data block comprising processing state metadata comprises a cryptographic random value (HMAC) for the processing state metadata and/or the underlying media data. Since the data block is supposed to be digitally signed in these arrangements, a downstream media processing unit can relatively easily authenticate and validate the processing state metadata. Other cryptographic methods including, but not limited to, any one or more of the non-HMAC cryptographic methods may be used for secure transmission and reception of processing state metadata and/or underlying media data.
[00077] As previously described, a media processing unit such as an encoder/transcoder as described here, can be configured to accept legacy media bitstreams and PCM samples. If the input media bitstream is a legacy media bitstream, the media processing unit may check for an evolution flag which may be in the media bitstream or which may be hidden in the media data through of one of enhanced legacy decoders that comprises pre-processing and metadata validation logic as described previously. In the absence of an evolution flag, the encoder is configured to perform adaptive processing and to generate processing state metadata as appropriate in an output media bitstream, or in a data block comprising the processing state metadata. For example, as shown in Figure 6, an example unit such as the real-time domain sound volume and dynamic range transform controller can adaptively process audio content from the received unit's input media data and adjust automatically sound volume and dynamic range if an "evolution flag" is missing in the input media data or source media bitstreams. Additionally, optionally or alternatively, another unit may make use of feature-based metadata to perform adaptive processing.
[00078] In example of modalities as illustrated in figure 6, the encoder can be aware that the post/preprocessing unit that performed a media type processing (for example, processing in the sound volume domain) and from there can create processing state metadata in a block of data that includes the specific parameters used in and/or derived from volume domain processing. In some possible embodiments, the encoder can create processing state metadata that reflects processing history about the content in the media data as long as the encoder is aware of the types of processing that have been performed (eg, processing in the sound volume domain about the content of the media data). Additionally, optionally or alternatively, the encoder may perform adaptive processing based on one or more media characteristics described by means of the processing state metadata. Additionally, optionally or alternatively, the encoder can perform analysis of the media data and generate a description of media characteristics as a part of the processing state metadata to be provided to any of the other processing units.
[00079] In some possible embodiments, a decoder using techniques here is able to understand the state of media data in the following scenarios.
[00080] Under a first scenario, if the decoder receives a media bitstream with the "evolution flag" set to indicate the validity of the processing state metadata in the media bitstream, the decoder can slice and/or retrieve processing state metadata and signaling a downstream media processing unit such as an appropriate post-processing unit. On the other hand, if an "evolution flag" is absent, then the decoder can signal to the downstream media processing unit that volume leveling processing should still be performed as sound volume metadata - for example, that it should have been included in the processing state metadata in some possible modalities had volume leveling processing already performed - either absent or cannot be trusted as valid.
[00081] Under a second scenario, if the decoder receives a media bitstream generated and encoded by an upstream media processing unit such as a cryptographic random value evolution encoder, then the decoder can slice and retrieve the value cryptographic random from a data block comprising processing state metadata and use the cryptographic random value to validate the received media bitstream and associated metadata. For example, if the decoder discovers the associated metadata (for example, volume metadata from processing state metadata) to be valid based on a match between a reference cryptographic random value and the cryptographic random value retrieved from the block of data, and then the decoder can signal to the downstream media processing unit such as a volume leveling unit to pass the media data such as unchanged audio. Additionally, optionally or alternatively, other types of cryptographic techniques can be used in place of a method based on a cryptographic random value. Additionally, optionally or alternatively, different volume leveling operations can also be performed based on one or more media characteristics of the media data as described in the processing state metadata.
[00082] Under a third scenario if the decoder receives a media bit stream generated by an upstream media processing unit such as the evolution encoder, but a data block comprising processing state metadata is included in the stream of media bits; instead the data block is stored in a media processing database. The decoder is configured to create a fingerprint of the media data in the media bitstream such as audio, and then use the fingerprint to query the media processing database. The media processing database can return the appropriate data block associated with the received media data based on the corresponding fingerprint. In some possible embodiments, the encoded media bitstream contains a simple universal resource locator (URL) to direct the decoder to send the digital-based query as discussed previously to the media processing database.
[00083] In all these scenarios the decoder is configured to understand the state of the media and signal a downstream media processing unit to adapt the latter's processing of the media data accordingly. In some possible embodiments, the media data here can be recoded after being decoded. In some possible embodiments, a data block comprising contemporary processing state information corresponding to the recoding can be passed over a downstream media processing unit such as an encoder/converter subsequent to the decoder. For example, the data block can be included as associated metadata in the output media bitstream from the decoder.
[00084] Figure 7 illustrates an example of evolution decoder that controls modes of operation of a volume leveling unit based on the validity of sound volume metadata in and/or associated with processing state metadata according to some possible embodiments of the present invention. Other operations such as feature-based processing can also be handled. Any of the outlined components can be implemented as one or more processes and/or one or more IC circuits (including ASICs and FPGAs), in hardware, software, or a combination of hardware and software. The decoder may comprise a number of inherited subunits such as a frame information module (e.g. a frame information module in AC-3, MPEG AAC, MPEG HE AAC, E AC-3, etc.), a decoder front-end (eg a FED in AC-3, MPEG AAC, MPEG HE AAC, E AC-3, etc.), synchronization and conversion (eg a sync and conversion module in AC-3, MPEG AAC, MPEG HE AAC, E AC-3, etc.), frame adjustment accumulator, back-end decoder (eg a BEE in AC-3, MPEG AAC, MPEG HE AAC, E AC-3, etc.). ), CRC regeneration, media transformation (eg Dolby Volume) etc. With these inherited subunits the decoder should be able to transport media content in media data to a downstream media processing unit and/or transform the media content. However, the decoder should not be able to transport media data state or provide media processing signaling and/or processing state metadata in the output bitstream a.
[00085] Under the techniques here, in some possible embodiments as illustrated in figure 7, the decoder can comprise any of a plurality of new subunits such as metadata manipulation (evolution data and/or other metadata entries that include one or more types of third party partner data, tracking information, identifiers, proprietary or pattern information, user annotation data, user preference data, feature extraction, feature manipulation, etc.), secure communication (eg, proof of forgery), communication for processing state information (HMAC generator and signature validator, other cryptographic techniques) media digital extraction (eg audio and video digital extraction), adjunct media processing (eg speech channels/sound volume information, other types of media characteristics), data hiding (eg PCM data hiding that p may be destructive/irreversible or reversible), media processing signaling insertion, HMAC generator (which may, for example, include "add_bsi" insertion, or insertions for one or more auxiliary data fields), other cryptographic techniques, retrieval and hidden data validation (eg PCM hidden data validator and retrieval), "undo" data hiding, one or more switches that operate based on processing state signaling and/or processing state metadata (eg "valid evolution data and data concealment insertion control from a HMAC signature generator and validator), etc. As illustrated, information extracted by the HMAC generator validator and signature validator and the audio and video digital extraction can be output to or used for synchronous audio and video correction, ratings, media rights, quality control, processing processes. media localization, feature-based processing, etc.
[00086] In some possible embodiments, a post-pre-processing unit in a media processing chain does not operate independently. Instead, the post-preprocessing unit can interact with an encoder or a decoder in the media processing chain. In case of interacting with an encoder, the post-preprocessing unit can help to create at least a piece of processing state metadata about the state of media data in a data block. In case of interacting with a decoder the post-pre-processing unit is configured to determine the state of the media data and to adapt its processing of the media data accordingly. In an example in Figure 7, an example post-preprocessing unit such as a volume leveling unit can retrieve the data hidden in the PCM samples sent by an upstream decoder and determine based on the hidden data whether or not metadata sound volume settings are valid. If the sound volume metadata is valid, input media data such as audio can be passed unchanged through the volume leveling unit. In another example, an example post-preprocessing unit can retrieve the data hidden in the PCM samples sent by an upstream decoder and determine based on the hidden data one or more types of media characteristics previously determined from the content of the media samples. If a speech-recognized keyword is indicated, the post-preprocessing unit can perform one or more specific operations relating to the voice-recognized keyword. 5. Data Hiding
[00087] Figure 8 illustrates configuration example of using data concealment to pass media processing information according to some possible embodiments of the present invention. In some possible embodiments, data concealment can be used to enable signaling between an upstream media processing unit, such as an evolution encoder or decoder (eg #1 audio processing) and an upstream media processing unit. downstream, such as a post-preprocessing unit (eg audio processing #2) where there is no metadata path between the upstream and downstream media processing units.
[00088] In some possible embodiments, reversible media data hiding (eg reversible audio data hiding) can be used to modify media data samples (eg X) in the media data to media data samples modified media (eg X') that carry media processing signaling and/or state metadata processing between the two media processing units. In some possible embodiments, the modification to the media data samples described here is done in such a way that there is no noticeable degradation as a result of the modification. Thus, even though there cannot be another media processing unit subsequent to media processing unit 1, no audible or visible artifacts can be perceived with the modified media data samples. In other words, hiding the media processing signaling and/or processing state metadata in a noticeably transparent manner would not cause any audible or visible artifacts when audio and video in the modified media data samples are transformed.
[00089] In some possible embodiments, a media processing unit (eg, audio processing unit #2 of figure 8) retrieves embedded media processing signaling and/or processing state metadata from the samples of modified media data, and restores the modified media data samples to the original media data samples, undoing the modifications. This can be done, for example, through a sub-unit (eg information extraction and audio restoration). The retrieved embedded information can then serve as a signaling mechanism between the two media processing units (for example, audio processing units #1 and #2 in Figure 8). The robustness of the data hiding technique here may be dependent on what types of processing can be performed by the media processing units. An example of media processing unit #1 might be a digital decoder in a set-top box, while an example of media processing unit #2 might be a volume leveling unit in the same set-top box. If the decoder determines that the sound volume metadata is valid, the decoder can use a reversible data concealment technique to signal the subsequent volume leveling unit not to apply leveling.
[00090] In some possible modalities, irreversible media data hiding (for example, an irreversible secure communication channel based on data hiding technique) can be used to modify media data samples (for example, X) in the media data for modified media data samples (eg X') that carry media processing signaling and/or processing state metadata between the two media processing units. In some possible embodiments, the modification to the media data samples described here is done in such a way that there is minimal noticeable degradation as a result of the modification. Thus, minimal audible or visible artifacts can be perceived with the modified media data samples. In other words, hiding media processing signaling and/or processing state metadata in a noticeably transparent manner could cause minimal audible or visible artifacts when audio and video in the modified media data samples are transformed.
[00091] In some possible embodiments, modifications of the media data samples modified through irreversible data hiding may not be undone to recover the original media data samples. 6. Process flow example
[00092] Figure 9A and Figure 9B illustrate example of process flows according to a possible embodiment of the present invention. In some possible embodiments, one or more computing devices or units in a media processing system can carry out this process flow.
[00093] In block 910 of Figure 9A a first device in a media processing chain (for example, an enhanced media processing chain as described here) determines whether a type of media processing has been performed on an output version of media data. The first device can be a part or all of a media processing unit. At block 920 in response to determining what type of media processing has been performed on the output version of the media data, the first device may create a state of the media data. In some possible embodiments, the state of the media data can specify the type of media processing, the result of which is included in the output version of the media data. The first device can communicate to a second device downstream in the media processing chain the output version of the media data and the state of the media data, for example, in an output media bitstream or in a stream of auxiliary metadata bits associated with a separate media bitstream that carries the output version of the media data.
[00094] In some possible embodiments, media data comprises one or more media content of only audio content, only video content, or both audio content and video content.
[00095] In some possible embodiments, the first device may provide to the second device the state of the media data as or one or more of: (a) media digitals and, (b) processing state metadata, or (c ) media processing signaling.
[00096] In some possible embodiments, the first device can store a block of media processing data in a media processing database. The media processing data block may comprise media processing metadata and in which the media processing data block is recoverable based on one or more media fingerprints that are associated with the media processing data block.
[00097] In some possible embodiments, the media data state comprises a cryptographic random value encrypted with credential information. The cryptographic random value can be authenticated by a receiving device.
[00098] In some embodiments, at least a portion of the media data state comprises one or more secure communication channels hidden in the media data, and in which the one or more secure communication channels must be authenticated by the receiving device. In an example of embodiment the one or more secure communication channels may comprise at least one secure spread-spectrum communication channel. In an exemplary embodiment the one or more secure communication channels comprises at least one frequency shift switching secure communication channel.
[00099] In some possible embodiments, the state of the media data comprises one or more sets of parameters that are used in and/or derived from the media processing type.
[000100] In some possible embodiments, at least one of the first device and the second device comprises one or more pre-processing units, encoders, media processing sub-units, transcoders, decoders, post-processing units, or media sub-units. transformation of media content. In an example modality, the first device is an encoder (eg an AVC encoder) while the second device is a decoder (eg an AVC decoder).
[000101] In some possible embodiments, the type of processing is performed by the first device, while in some other possible embodiments the type of processing is instead performed by an upstream device with respect to the first device in the processing chain. media.
[000102] In some possible embodiments, the first device can receive an input version of the media data. The input version of media data does not understand any media data state that indicates the type of media processing. In these arrangements the first device can analyze the input version of the media data to determine the type of media processing that has already been performed on the input version of the media data.
[000103] In some possible embodiments, the first device encodes sound volume and dynamic range of media data state.
[000104] In some possible embodiments, the first device can adaptively avoid performing the type of media processing that was performed by an upstream device. However, even when media processing type has been performed, the first device can receive a command to bypass the media processing type performed by the upstream device. Instead, the first device can be commanded to still perform the media processing type, for example, with any of the same or different parameters. The state of media data that communicated from the first device to a second device downstream in the media processing chain may comprise an output version of the media data which includes the result of the type of media processing performed by the first device under the command and a media data state that indicates the type of media processing has already been performed on the output version of the media data. In several possible embodiments, the first device may receive the command from one of: (a) user input, (b) a system configuration of the first device, (c) signaling from a device external to the first device, or (d) signaling from a subunit within the first device.
[000105] In some embodiments the media data state comprises at least a portion of state metadata hidden in one or more secure communication channels.
[000106] In some embodiments, the first device alters a plurality of bytes in the media data to store at least a portion of the state of the media data.
[000107] In some embodiments, at least one of the first device and the second device comprises one or more of Advanced Television Systems Committee (ATSC) codecs, Moving Picture Experts Group (MPEG) codecs, (AC-3) Audio Codec 3 codecs, and Enhanced AC-3 codecs.
[000108] In some embodiments, the media processing chain comprises a pre-processing unit configured to accept time-domain samples comprising media content as input and to output time-domain processed samples; an encoder configured to output the compressed media bitstream of the media content based on the processed time domain samples, a signal analysis and metadata correction unit configured to validate processing state metadata in the bitstream of compressed media; a transcoder configured to modify the compressed media bitstream, a decoder configured to output time-domain decoded samples based on the compressed media bitstream; a post-processing unit configured to perform post-processing of media content on time-domain decoded samples. In some embodiments at least one of the first device and the second device comprises one or more of the pre-processing unit, signal analysis and metadata correction unit, the transcoder, the decoder, and the post-processing unit. In some modalities, at least one of the pre-processing unit, signal analysis and metadata correction unit, transcoder, decoder and post-processing unit performs adaptive processing of media content based on received processing metadata from an upstream device.
[000109] In some embodiments, the first device determines one or more media characteristics from the media data, and includes a description of the one or more media characteristics in the media data state. One or more media features may comprise at least one media feature determined from one or more frames, seconds, minutes, user-definable time intervals, scenes, songs, pieces of music, and recordings. One or more media characteristics comprise a semantic description of the media data. In various modalities one or more media characteristics comprises one or more structural properties, tonality that includes harmony and melody, timbre, rhythm, volume, stereo mix, a number of sound sources of the media data, absence or presence of voice , repeating and melody characteristics, harmonies, lyrics, timbre and perceptible characteristics, digital media characteristics, stereo parameters, one or more portions of speech content.
[000110] In block 950 of figure 9b a first device in a media processing chain (for example, an enhanced media processing chain as described here), determines whether a type of media processing has already been performed on a version of input data from media.
[000111] In block 960, in response to determining that the media processing type has already been performed on media data input aversion, the first device adapts media data processing to disable performing the media processing type on the first device. In some possible embodiments, the first device can turn off one or more types of media processing based on an input state of the media data.
[000112] In some possible embodiments, the first device communicates to a second device downstream in the media processing chain an output version of the media data and a state of the media data that indicates that the media processing type has already been performed on the output version of the media data.
[000113] In some possible embodiments, the first device can encode sound volume and dynamic range of media data state. In some possible embodiments, the first device can automatically perform one or more adaptive corrective sound volume or dynamic audio processing based at least in part on whether the type of processing has already been performed on the input version of the media data.
[000114] In some possible embodiments, the first device can perform a second different type of media processing of the media data. The first device can communicate to a second device downstream in the media processing chain an output version of the media data and a state of the media data which indicates that the media processing type is the second different media processing type. have already been performed on the output version of the media data.
[000115] In some possible embodiments, the first device can retrieve an input state of the media data that is associated with the input version of the media data. In some possible embodiments, the input state of the media data is loaded with the input version of the media data in an input media bitstream. In some possible embodiments, the first device can extract input state and media data from data units in the media data that encode media content. The input state of media data can be hidden in one or more of the data drives.
[000116] In some possible embodiments, the first device can retrieve a version of the data drives that does not understand the input state of the media data and transform the media content based on the version of the data drive that was retrieved.
[000117] In some possible embodiments, the first device can authenticate the input state of the media data and validate a cryptographic random value associated with the input state of the media data.
[000118] In some embodiments, the first device authenticates the media data input state by validating one or more fingerprints associated with the media data input state, in which at least one or more impressions are generated based on the at least a portion of the media data.
[000119] In some embodiments, the first device validates the media data by validating one or more fingerprints associated with the input state of the media data, in which at least one or more fingerprints are generated based on at least a portion of the data from media.
[000120] In some possible embodiments, the first device can receive the input state of the media data as described by the processing state metadata. The first device can create media processing signaling based on at least part of the processing state metadata. Media processing signaling can indicate the input state of media data even though media processing signaling may be of a smaller data volume and/or require a lower bit rate than that of processing state metadata . The first device may transmit the media processing signal to a downstream media processing device to the first device in the media processing chain. In some possible embodiments, the media processing signal e is hidden in one or more data units in an output version of the media data using a reversible data hiding technique, such that one or more modifications to the data media are removable by a receiving device. In some embodiments the media processing signal is hidden in one or more data units in an output version of the media data using an irreversible data hiding technique such that at least one of one or more modifications to the data media is not removable by a receiving device.
[000121] In some embodiments, the first device determines one or more media characteristics based on a description of one or more media characteristics in the state of the media data. One or more media characteristics may comprise at least one media characteristic determined from one or more of frames, seconds, minutes, user definable time intervals, scenes, songs, pieces of music, and recordings. One or more media characteristics comprise a semantic description of the media data. In some embodiments, the first device performs one or more specific operations in response to determining one or more media characteristics.
[000122] In some possible embodiments, a method is provided which comprises: computing with a first device in a media processing chain one or more data rate reduced representations of a media data source structure and loading one or more representations data speeds simultaneously and securely within a state of the media data itself to a second device in the media processing chain, in which the method is performed by one or more computing devices.
[000123] In some possible embodiments, one or more reduced data rate representations are loaded into at least one of a substream, one or more reserved fields, an add_bsi field, one or more auxiliary data fields, or one more transformation coefficients .
[000124] In some possible embodiments one or more reduced data rate representations comprise synchronization data used to synchronize audio and video delivered within the media data.
[000125] In some possible embodiments, one or more reduced data rate representations comprise media digitals (a) generated by a media processing unit and (b) embedded with the media data for one or more quality monitoring , media ratings, media tracking, or content search.
[000126] In some possible embodiments, the method further comprises computing and transmitting, through at least one of the one or more computing devices in the media processing chain, a cryptographic random value based on the media and/or state data of the media data within one or more encoded bitstreams that carry the media data.
[000127] In some possible embodiments, the method further comprises: authenticating through a receiving device the cryptographic random value; signaling by means of the receiving device to one or more downstream media processing units a determination as to whether the state of the media data is valid; and signaling by means of the receiving device to one or more downstream media processing units the state of the media data in response to determining that the state of media data is valid.
[000128] In some possible embodiments, the cryptographic random value representing the state of media and/or media data is loaded into at least one of a substream, one or more reserved fields, an add_bsi field, one or more fields of auxiliary data, or one or more transformation coefficients.
[000129] In some possible embodiments, a method is provided comprising: adaptively processing the one or more computing devices in a media processing chain comprising one or more psycho-acoustic units, transformers, shape coding units wave/spatial audio, encoders, decoders, transcoders or current processors, an input version of media data based on a past history of sound volume processing of media data by means of one or more processing units. upstream media as indicated by a state of the media data; normalizing sound volume and/or dynamic range of an output version of the media data at one end of the media guide processing chain to consistent sound volume and/or dynamic range values.
[000130] In some possible embodiments, the consistent volume value comprises a volume value (1) controlled or selected by a user, or (2) adaptively signaled through a state in the input version of the media data.
[000131] In some possible embodiments, the sound volume value is computed over the dialog (speech) portions of the media data.
[000132] In some possible modalities, the sound volume value is computed over absolute, relative and/or non-blocking portions of the media data.
[000133] In some possible embodiments, the consistent dynamic range value comprises a dynamic range value of (1) controlled or selected by a user, or (2) adaptively signaled by a state in the input version of the media data .
[000134] In some possible embodiments, the dynamic range value is computed over the dialog (speech) portions of the media data.
[000135] In some possible embodiments, the dynamic range value is computed over absolute, relative, and/or non-blocking portions of the media data.
[000136] In some possible embodiments, the method further comprises computing one or more sound volume and/or dynamic range gain control values to normalize the output version of the media data and the consistent sound volume value and consistent dynamic range; simultaneously load the one or more sound volume and/or dynamic range gain control values within a state of the output version of the media data at the end of the media processing chain in which the one or more sonic volume and/or dynamic range gain control values are usable by another device to reversely apply one or more sonic volume and/or dynamic range gain control values to recall a volume value original sound and an original dynamic range in the input version of the media data.
[000137] In some possible embodiments, the one or more sound volume and/or dynamic range gain control values representing the state of the output version of the media data are loaded into at least one of a substream, a or more reserved fields, m add_bsi field, one or more auxiliary data fields, or one or more transformation coefficients.
[000138] In some possible embodiments, a method is provided which comprises performing one of inserting, extracting or editing related and unrelated media data locations and/or a state of related and unrelated media data locations within an or more encoded bit streams for one or more computing devices in a media processing chain comprising one or more psycho-acoustic units, transformers, waveform/spatial audio encoding units, encoders, decoders, transcoders or processors due,
[000139] In some possible embodiments, one or more related and unrelated media data locations and/or the state of related and unrelated media data locations within encoded bitstreams are loaded into at least one of a substream , one or more reserved fields, an add_bsi field, one or more auxiliary data fields, or one or more transformation coefficients.
[000140] In some possible embodiments, a method is provided comprising performing one or more of inserting, extracting or editing related and unrelated media data and/or a state of related and unrelated media data within one or more streams of bits encoded by means of one or more computing devices in a media processing chain comprising one or more of psycho-acoustic units, transformers, waveform/spatial audio encoding units, encoders, decoders, transcoders or processors due,
[000141] In some possible embodiments, the one or more related and unrelated media data and/or the state of related and unrelated media data within encoded bitstreams are carried in at least one of a substream, one or plus reserved fields, an add_bsi field, one or more auxiliary data fields, or one or more transformation coefficients.
[000142] In some possible embodiments, a media processing system is configured to compute and load random values based on media data and/or a state of media data within one or more bit streams encoded by means of a or more computing devices in a media processing chain comprising one or more psycho-acoustic units, transformers, waveform/spatial audio encoding units, encoders, decoders, transcoders or current processors.
[000143] As used herein, the term "related and unrelated media data locations" may refer to information that may include a media resource locator such as an absolute path, relative path and/or URL indicating location of related media (eg a copy of media in a different bitstream format), or an absolute path, relative path is/or URL that indicates the location of unrelated media or other type of information that is not directly related to essence or bitstream where the media data location is located (eg the location of a new piece of media such as a commercial, web page, etc.).
[000144] As used herein, the term "status of related and unrelated media data locations" may refer to the validity of related and unrelated media locations (as they can be edited/updated throughout the entire cycle. of life of the bitstreams where they are loaded).
[000145] As used herein, "media-related data" may refer to loading data from related media in the form of bitstreams of secondary media data highly correlated with the primary media that the bitstream represents, (for example , loading a copy of the media data in a second independent bitstream format). In the context of unrelated media data this information could refer to loading bitstreams of secondary media data that are independent of the primary media data.
[000146] As used herein, "state" for media related data may refer to any signaling information (processing history, updated target sound volume, etc.) and/or metadata as well as the validity of related media data . "State" for unrelated media data may refer to independent signaling information and/or metadata, which includes validity information that could be loaded separately (independently) from the state of the "related" media data. The state of unrelated media data represents media data that is unrelated to the media data bitstream in which this information is found, as this information could be independently edited/updated throughout the entire cycle. life of the bitstreams in which they are loaded.
[000147] As used herein the terms "portions of absolute, relative and/or non-blocking media data" relate to blocking sound volume and/or level measurements performed on the media data. Blocking refers to a specific sonic volume threshold level where computed value that exceeds the threshold is included in the final measurement (for example, ignoring short-term volume value below -60dBFS in the final measured value). Blocking on an absolute value is referring to a fixed volume level, where blocking on a value is referring to a value that is dependent on the current unblocked measurement value.
[000148] Figures 12A to Figure 12L still illustrate block diagrams of some examples of nodes / media processing devices according to some embodiments of the present invention.
[000149] As illustrated in Fig. 12A, a signal processor (which may be Node 1 of N nodes) is configured to receive an input signal which may comprise PCM audio samples. PCM audio samples may or may not contain processing state metadata (or media state metadata) hidden between PCM audio samples. The signal processor of Figure 12A may comprise a media state metadata extractor which is configured to decode extract and/or interpret processing state metadata from the PCM audio samples as provided by one or more processing units of media before the signal processor in Figure 12A. At least a part of the processing state metadata can be provided to an audio encoder in the signal processor of Fig. 12A to adapt processing parameters for the audio encoder. In parallel, an audio analysis unit in the signal processor of Fig. 12A can analyze the media content passed in the input signal. Feature extraction, media classification, sound volume assessment, fingerprint generation, etc. can be implemented as part of the analysis performed by the audio analysis unit. At least a portion of the results of this analysis can be provided to the audio encoder in the signal processor of Fig. 12A to adapt processing parameters to the audio encoder. The audio encoder encodes the PCM audio samples from the input signal to an encoded bitstream into an output signal based on the processing parameters. An encoded bitstream analysis unit of the signal processor of Fig. 12A can be configured to determine whether media data or encoded bitstream samples to be transmitted in the output signal of the signal processor of Fig. 12A has space to store at least a portion of the processing state metadata. The new processing state metadata to be transmitted through the signal processor of Fig. 12A comprises some or all of the processing state metadata that was extracted by the media state metadata extractor, the processing state metadata that was generated by the audio analysis unit and a media state metadata generator of the signal processor of Fig. 12A and/or any third party data. If it is determined that the media data or encoded bitstream samples have space to store at least a portion of the processing state metadata, some or all of the new processing state metadata may be stored as data hidden in the data of media or samples on the output signal. Additionally, optionally or alternatively, a part or all of the new processing state metadata may be stored in the separate metadata structure apart from the media data and samples in the output signal. Thus, the output signal may comprise an encoded bitstream that contains the new processing state metadata (or "media state") loaded into and/or between the media samples (essence) via a secure communication channel hidden or not hidden.
[000150] As illustrated in Fig. 12B, a signal processor (which may be Node 1 of N nodes) is configured to receive an input signal which may comprise PCM audio samples. PCM audio samples may or may not contain processing state metadata (or media state metadata) hidden between PCM audio samples. The signal processor of Fig. 12B may comprise a media state metadata extractor which is configured to decode, extract and/or interpret the processing state metadata from the PCM audio samples as provided by one or more processing units of media before the signal processor of Figure 12B. At least a portion of the processing state metadata may be provided to a PCM audio sample processor in the signal processor of Fig. 12B to adapt processing parameters for the PCM audio sample processor. In parallel, an audio analysis unit in the signal processor of Fig. 12B can analyze the media content passed in the input signal. Feature extraction, media classification, sound volume evaluation, and digital generation etc. can be implemented as a part of the analysis performed by the audio analysis unit. At least a portion of the results of this analysis can be provided to the audio encoder in the signal processor of Fig. 12B to adapt processing parameters to the PCM audio sample processor. The PCM audio sample processor processes the PCM audio samples in the input signal to a bitstream (samples) of PCM audio in the output signal based on the processing parameters. A PCM audio analysis unit in the signal processor of Fig. 12B can be configured to determine whether media data or samples in the PCM audio bitstream to be transmitted in the output signal of the signal processor of Fig. 12B has space to store at least a portion of the processing state metadata. The new processing state metadata to be transmitted by the signal processor of Fig. 12B comprises some or all of the processing state metadata that was extracted by means of the media state metadata extractor, the processing state metadata that was generated by the audio analysis unit and a media state metadata generator of the signal processor of Fig. 12B, and/or any third party data. If it is determined that the media data or PCM audio bitstream samples have space to store at least a portion of processing state metadata, some or all of the new processing state metadata may be stored as hidden data in the media data or samples of the output signal. Additionally, optionally or alternatively, some or all of the processing state metadata may be stored in separate metadata structure apart from the media data and samples in the output signal.
[000151] Thus, the output signal may comprise a PCM audio bitstream that contains the new processing state metadata (or "media state") loaded into or between the media samples (essence) via a channel hidden or non-hidden secure communication tools.
[000152] As illustrated in Fig. 12C, a signal processor (which may be Node 1 of N nodes) is configured to receive an input signal which may comprise a bit stream (samples) of PCM audio. The PCM audio stream may contain processing state metadata (or media state metadata) carried within and/or between media samples (essence) in the PCM audio stream via a hidden secure communication channel or not hidden. The signal processor of Fig. 12C may comprise a media state metadata extractor which is configured to decode, extract and/or interpret the processing state metadata from the PCM audio bitstream. At least a portion of the processing state metadata can be provided to a PCM audio sample processor in the signal processor of Fig. 12C to adapt processing parameters for the PCM audio sample processor. Processing state metadata may include a description of media characteristics, media class types or subtypes, or possibility/probability values as determined by one or more media processing units before the signal processor of Figure 12C whose processor The signal of figure 12C can be configured to use without performing its own media content analysis. Additionally, optionally or alternatively, the media state metadata extractor can be configured to extract third party data from the input signal and transmit the third partner data to a downstream processing node/entity/device. In one embodiment the PCM audio sample processor processes the audio bitstream for PCM audio samples into the output signal based on the adjusted processing parameters based on the processing state metadata provided by the one or more processing units. media before the signal processor in figure 12 C.
[000153] As illustrated in Figure 12D, a signal processor (which may be Node 1 of N nodes) is configured to receive an input signal which may comprise an encoded audio bitstream that contains processing state metadata (or audio state metadata) loaded into and/or hidden between media samples via a hidden or non-hidden secure communication channel. The signal processor of Figure 12D may comprise a media state metadata extractor which is configured to decode, extract and/or interpret processing state metadata from the encoded bitstream as provided by the one or more processing units of media before the signal processor in Figure 12D. At least a part of the processing state metadata can be provided to an audio decoder in the signal processor of Fig. 12D to adapt processing parameters to the audio decoder. In parallel, an audio analysis unit in the signal processor of Figure 12D can analyze the media content passed in the input signal. Feature extraction, media classification, sound volume evaluation, digital generation, etc. can be implemented as a part of the analysis performed by the audio analysis unit. At least a part of the results of this analysis can be provided to the audio decoder in the signal processor of Fig. 12D to adapt processing parameters to the audio decoder. The audio decoder transforms the encoded audio bitstream from the input signal to a PCM audio bitstream into an output signal based on the processing parameters. A PCM audio analysis unit in the signal processor of Figure 12D can be configured to determine whether media data or samples in the PCM audio bitstream has room to store at least a portion of processing state metadata. The new processing state metadata to be transmitted through the signal processor of Figure 12D comprises some or all of the processing state metadata that was extracted through the media state metadata extractor, the processing state metadata that were generated by the audio analysis unit and a media state metadata generator of the signal processor of Figure 12D, and/or any third party data. If it is determined that the media data or samples in the PCM audio bitstream have space to store at least a portion of the processing state metadata, some or all of the new processing state metadata may be stored as hidden data in the media data or samples of the output signal. Additionally, optionally or alternatively, a part or all of the new processing state metadata may be stored in separate metadata structure apart from the media data and samples in the output signal. Thus the output signal may comprise a PCM audio bitstream (samples) that contains processing state (or media state) metadata carried into and/or between the data/media (essence) samples via a channel. hidden or non-hidden secure communication.
[000154] As illustrated in Fig. 12E, a signal processor (which may be Node 1 of N nodes) is configured to receive an input signal which may comprise an encoded audio bitstream. The encoded audio bitstream may contain processing state metadata (or media state metadata) carried within and/or between the media (essence) samples in the encoded audio bitstream via a hidden secure communication channel or not hidden. The signal processor of Fig. 12E may comprise a media state metadata extractor which is configured to decode, extract and/or interpret the processing state metadata from the encoded audio bitstream. At least a part of the processing state metadata can be provided to the audio decoder in the signal processor of Fig. 12E to adapt processing parameters to the audio decoder. Processing state metadata may include a description of media characteristics, media class types or subtypes, or possibility/probability values as determined by one or more media processing units before the signal processor of Figure 12E, which the signal processor of Figure 12E can be configured for use without performing its own media content analysis. Additionally, optionally or alternatively, the media state metadata extractor can be configured to extract third party data from the input signal and transmit the third partner data to a downstream processing node/entity/device. In one embodiment the audio decoder processes the encoded audio bitstream to PCM audio samples an output signal based on the adjusted processing parameters based on the processing state metadata provided by the one or more media processing units before of the signal processor in Figure 12E.
[000155] As illustrated in Fig. 12F, a signal processor (which may be Node 1 of N nodes) is configured to receive an input signal which may comprise an encoded audio stream containing processing state metadata (or processing metadata media state) loaded within and/or hidden between the media samples via a hidden or non-hidden secure communication channel. The signal processor of Fig. 12F may comprise a media state metadata extractor which is configured to decode, extract and/or interpret processing state metadata from the encoded bitstream as provided by one or more processing units of media before the signal processor in Figure 12F|. At least a portion of the processing state metadata may be provided to a bitstream transcoder (or encoded audio bitstream processor) in the signal processor of Fig. 12F to adapt processing parameters to the bitstream transcoder . In parallel, an audio analysis unit in the signal processor of Fig. 12F can analyze the media content passed in the input signal. Feature extraction, media classification, sound volume evaluation, fingerprint generation, etc. can be implemented as a part of the analysis performed by the audio analysis unit. At least a part of the results of this analysis can be provided to the bitstream transcoder in the signal processor of Fig. 12F to adapt processing parameters to the bitstream transcoder. The bitstream transcoder transforms the audio bitstream encoded in the input signal to an audio bitstream encoded in an output signal based on the processing parameters. An encoded bitstream analysis unit in the signal processor of Fig. 12F can be configured to determine if the media data or samples in the encoded audio bitstream has room to store at least a portion of processing state metadata. The new processing state metadata to be transmitted by the signal processor of Fig. 12F comprises any or all of the processing state metadata that was extracted by the media state metadata extractor, and the processing state metadata that was generated by the audio analysis unit and a signal processor media state metadata generator of Fig. 12F, and/or any third party data. If it is determined that the media data or samples in the encoded audio bitstream has space to store at least a portion of the processing state metadata, some or all of the new processing state metadata may be stored as hidden data in the media data or samples in the output signal. Additionally, optionally, or alternatively, a part or all of the new processing state metadata may be stored in separate metadata structure away from the media data of the output signal. Thus, the output signal may comprise an encoded audio bitstream that contains processing state metadata (or "media state") carried into and/or between the media/sample (essence) data via a channel. hidden or non-hidden secure communication.
[000156] Figure 12G illustrates an example of configuration similar to that of Figure 12A in part. Additionally, optionally or alternatively, the signal processor of Figure 12G may comprise a media state metadata extractor that is configured to query a local and/or external media state metadata database, which can be operationally linked to the Figure 12G signal processor through intranet and/or the Internet. A query sent by the signal processor of Figure 12G to the database may include one or more fingerprints associated with the media data, one or more names associated with the media data (eg a song title, a title of film) or any other types of identifying information associated with media data. Based on the information from the query, matched media state metadata stored in the database can be located and provided to the signal processor of Fig. 12G. Media state metadata can be included in processing state metadata provided by the media state metadata extractor to downstream processing nodes/entities such as an audio encoder. Additionally, optionally or alternatively, the signal processor of Fig. 12G may comprise a media state metadata generator which is configured to provide any generated media state metadata and/or associated identification information such as fingerprints, names and/or other types of identifying information to a local and/or external media state metadata database as illustrated in Figure 12G. Additionally, optionally or alternatively one or more portions of media state metadata stored in the database can be provided to the signal process of Fig. 12G to be communicated to a downstream media processing node/device and within and/or between media samples (essence) through with a hidden or non-hidden secure communication channel.
[000157] Figure 12H illustrates an example of configuration similar to that of figure 12B in part. Additionally, optionally or alternatively, a signal processor of Fig. 12H may comprise a media state metadata extractor that is configured to query whether the local and/or external media state metadata database, which can be operationally articulated to the signal processor of Figure 12H via intranet and/or the Internet. A query sent by the signal processor of Figure 12H to the database may include one or more fingerprints associated with the media data, one or more names associated with the media data (for example, a song title or a song title. film), or any other types of identifying information associated with media data. Based on the information from the query, matched media state metadata stored in the database can be located and provided to the signal process of Figure 12H. Media state metadata can be included in processing state metadata provided by the media state metadata extractor to downstream processing nodes/entities such as a PCM audio sample processor. Additionally, optionally or alternatively, the signal processor of Fig. 12H may comprise a media state metadata generator which is configured to provide any generated media state metadata and/or associated identification information such as fingerprints, names and/or other types of identifying information to a local and/or external media state metadata database, as illustrated in Figure 12H. Additionally, optionally or alternatively, one or more portions of the media state metadata stored in the database may be provided to the signal process of Figure 12H to be communicated to a downstream media processing node/device within and/or between media samples (essence) via a hidden or non-hidden secure communication channel.
[000158] Figure 12I illustrates an example of configuration similar to that of figure 12C in part. Additionally, optionally or alternatively, the signal processor of Fig. 12I may comprise a media state metadata extractor that is configured to query a local and/or external media state metadata database that can be operably linked to the processor. 12I signal through intranet and/or the Internet. A query sent by the signal processor of Fig. 12I to the database may include one or more fingerprints associated with the media data, one or more names associated with the media data (e.g., a song title or a song title. film), or any other types of identifying information associated with media data. Based on the information from the query, matched media state metadata stored in the database can be located and provided to the signal process of Fig. 12I. Media state metadata can be provided to downstream processing nodes/entities such as a PCM audio sample processor.
[000159] Figure 12J illustrates a configuration example similar to that of figure 12D in part. Additionally, optionally or alternatively, a signal processor of Fig. 12J may comprise a media state metadata extractor that is configured to query the local and/or external media state metadata database, which can be operationally linked to the Figure 12J signal processor via intranet and/or the Internet. A query sent by the signal processor of Fig. 12J to the database may include one or more fingerprints associated with the media data, one or more names associated with the media data (e.g., a song title, a song title, film) or any other types of identifying information associated with the media data. Based on the information in the query, matched media state metadata stored in the database can be located and provided to the signal process of figure 12J Media state metadata from the database can be included in state metadata provided to downstream processing nodes/entities such as an audio decoder. Additionally, optionally or alternatively, the signal processor of Fig. 12J may comprise an audio analysis unit that is configured to provide any generated media state metadata and/or associated identification information, such as fingerprints, names, and/or other types of identifying information to a local and/or external media state metadata database, as illustrated in Figure 12J. Additionally, optionally or alternatively, one or more portions of media state metadata stored in the database may be provided to the signal process of Figure 12J to be communicated to a downstream media processing node/device within and/or between media samples (essence) via a hidden or non-hidden secure communication channel.
[000160] Figure 12K illustrates an example configuration similar to that of Figure 12F in part. Additionally, optionally or alternatively, a signal processor of Figure 12K may comprise a media state metadata extractor that is configured to query a local and/or external media state metadata database, and can be operationally linked to the Figure 12K signal processor through intranet and/or the Internet. A query sent by the signal processor of Figure 12K to the database may include one or more fingerprints associated with the media data, one or more names associated with the media data (eg, a song title, a song title, film), or any other types of identifying information associated with media data. Based on the information in the query, matched media state metadata stored in the database can be located and provided to the signal process of Figure 12K. Media state metadata from the database can be included in processing state metadata provided to downstream processing nodes/entities, such as a bitstream transcoder or encoded audio bitstream processor. Additionally, optionally or alternatively, one or more bits of bit state metadata stored in the database may be provided to the signal process of Figure 12K to be communicated to a downstream media processing node/device within and/or between media samples (essence) via a hidden or non-hidden secure communication channel.
[000161] Figure 12L illustrates a node 1 signal processor and a node 2 signal processor according to an example of modality. The Node 1 signal processor and the Node 2 signal processor can be a part of a global media processing chain. In some embodiments the Node 1 signal processor adapts media processing based on processing state metadata that is received by the Node 2 signal processor, while the Node 2 signal processor adapts media processing based on the processing state metadata which are received by the node 2 signal processor. The processing state metadata received by the node 2 signal processor may comprise processing state metadata and/or media state metadata added by the node 1 signal processor after the node 1 signal processor. Node 1 signal analyzes the content of the media data; as a result, the node 2 signal processor can directly make use of the metadata provided by the node 1 signal processor in media processing without repeating anything or all of the analyzes previously performed by the node 1 signal processor. 7. Implementation Mechanisms - Hardware Overview
[000162] According to an embodiment, the techniques described here are implemented by means of one or more special purpose computing devices. Special-purpose computing devices can be wired to perform the techniques, or can include digital electronic devices such as one or more application-specific integrated circuits (ASICs), or field-programmable interlock systems (FPGAs) that are programmed persistently to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques following instructions from firmware, memory, other storage, or a combination. Such special-purpose computing devices can also combine wired logic, ASICs, or FPGAs with client programming to perform the techniques. Special purpose computing devices can be desktop computer systems, handheld computer systems, handheld devices, networked devices, or any other device that incorporates wired logic and/or program, to implement the techniques.
[000163] For example, Figure 10 is a block diagram illustrating a computer system 1000 on which an embodiment of the invention can be implemented. Computer system 1000 includes a bus 1002 or other communication mechanism for communicating information and a hardware processor 1004 coupled to bus 1002 for processing information. Hardware processor 1004 can be, for example, a general purpose microprocessor.
[000164] Computer system 1000 also includes main memory 1006 such as random access memory (RAM) or other dynamic storage device coupled to bus 1002 for storing information and instructions to be executed by processor 1004. 1006 can also be used to store temporary variables or other intermediate information during the execution of instructions to be executed by processor 1004. Such instructions when stored on a non-transient storage medium accessible to processor 1004 transforms computer system 1000 into a special purpose that is tailored to perform the operations specified in the instructions.
[000165] Computer system 1000 further includes a read-only memory (ROM) 1008 or other static storage device coupled to bus 1002 for storing static information and instructions for processor 1004. A storage device 1010 such as a magnetic disk or optical disk is provided and coupled to bus 1002 to store information and instructions.
[000166] The computer system 1000 can be coupled via bus 1002 to a display 1012 such as a cathode ray tube (CRT) to present information to a computer user. An input device 1014 that includes alphanumeric and other keys is coupled to bus 1002 to communicate information and command selections to processor 1004. Another type of user input device is cursor control 1016 such as a mouse, a TrackBall, or cursor direction keys to communicate command selections direction information to processor 1004 and to control cursor movement over display 1012. This input device typically has two degrees of freedom on two axes, a first axis (eg, x) and a second axis (for example, y), which allows the device to specify positions in a plane.
[000167] Computer system 1000 can implement the techniques described herein using bespoke wired logic or one or more ASICs or FPGAs, firmware and/or program logic that in combination with the computer system makes the computer system or the computer program system 1000 is a special purpose machine. According to one embodiment, the techniques herein are performed by computer system 1000 in response to processor 1004 executing one or more sequences of one or more instructions contained in main memory 1006. Such instructions can be read into main memory 1006 from another storage medium such as storage device 1010. Executing the sequences of instructions contained in main memory 1006 causes processor 1004 to perform the process steps described herein. In alternative embodiments, wired circuits can be used in place of or in combination with software instructions.
[000168] The term "storage media" as used herein, refers to any non-transient medium that stores data and/or instructions that cause a machine to operate in a specific manner. Such storage media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 1010. Volatile media includes dynamic memory such as main memory 1006. Common forms of storage media include, for example, a floppy disk, a floppy disk , hard disk, solid state hard disk, magnetic tape, or any other magnetic data storage media, a CD-ROM, any other optical data storage media, any physical media with hole patterns, a RAM, an EPROM , and EEPROM, a FLASH-EPROM, NVRAM, or any other memory chip or cartridge.
[000169] Storage media is distinct from but can be used in conjunction with broadcast media. Broadcast media participates in the transfer of information between storage media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires comprising bus 1002. Transmission media may also take the form of acoustic or light waves, such as those generated during communication by radio wave and infrared data.
[000170] Various forms of media can be involved in loading one or more sequences of one or more instructions to processor 1004 for execution. For example, instructions can be initially loaded onto the magnetic disk or solid state drive of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A local modem for computer system 1000 can receive data over the telephone line and use an infrared transmitter to convert the data to an infrared signal. An infrared detector can receive the data loaded into the infrared signal and appropriate circuitry can place the data onto bus 1002. Bus 1002 loads the data into main memory 1006 from which processor 1004 retrieves and executes instructions. Instructions received via main memory 1006 may optionally be stored in storage device 1010 either before or after execution by processor 1004.
[000171] Computer system 1000 also includes a communication interface 1018 coupled to bus 1002. Communication interface 1018 provides a two-way data communication coupling to a network link 1020 that is connected to a local network 1022. For example, communication interface 1018 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 1018 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links can also be implemented. In any such implementation, communication interface 1018 sends and receives electrical, electromagnetic, or optical signals that carry digital data streams that represent various types of information.
[000172] Network link 1020 typically provides data communication over one or more networks to other data devices. For example, network link 1020 may provide a connection over LAN 1022 to a host computer 1024 or to data equipment operated by an Internet Service Provider (ISP) 1026. instead provides data communication services through the worldwide packet data communication network now commonly referred to as the "Internet" 1028. The local network 1022 and the Internet 1028 both use electrical, electromagnetic, or optical signals that carry digital data currents. Signals across the various networks and signals over network link 1020 and through communication interface 1018, which carries digital data to and from computer system 1000, are exemplary forms of transmission media.
[000173] Computer system 1000 can send messages and receive data, including program code, over networks, network link 1020 and communication interface 1018. In the Internet example, a server 1030 can transmit a requested code to a application program over the Internet 1028, ISP 1026, local network 1022 and communication interface 1018.
[000174] Received code may be executed by processor 1004 when it is received, and/or stored in storage device 1010, or other non-volatile storage, for later execution. 8. Examples of numbered modalities
[000175] Thus, embodiments of the present invention may relate to one or more of the examples of embodiments numbered below, each of which are examples, and, like any other related discussion provided above, should not be construed as limiting any embodiment or embodiments. provided further below, when they remain now, or as with later amendments, replaced, or added. Likewise, these examples should not be considered as limiting with respect to any embodiment or embodiments of any patents and/or related patent applications (including any foreign or international counterpart applications and/or patents, continuing divisions, re-issues, etc.).
[000176] Example of modality numbered 1 is a method comprising: determining, by means of a first device in a processing chain, whether a type of media processing has been performed on an output version of media data; in response to determining, through the first device, that the type of media processing was performed on an output version of the media data, perform: (a) create, through the first device, a state of the media data, the state specifying the type of media processing performed on the output version of the media data, and (b) communicate from the first device to the second device downstream in the media processing chain, the output version of the media data. media and the state of the media data.
[000177] Modality example numbered 2 is a method as described in modality example numbered 1, in which the media data comprises media content as one or more of: audio content only, video content only or both, content of audio and video content.
[000178] Example of embodiment numbered 3 is a method as described in example of embodiment numbered 1, further comprising providing to the second device the state of the media data as one or more of (a) media fingerprints; (b) processing state metadata; (c) extracted media characteristic values; (d) descriptions and/or values of media class types or subtypes; (e) media feature class and/or subclass probability values, (f) cryptographic random value, or (f) media processing signaling.
[000179] Example of modality numbered 4 is a method as described in example of modality numbered 1, further comprising: storing a block of media processing data in a media processing database, in which the processing data block Media processing comprises media processing metadata and in which the media processing data block is recoverable based on one or more media fingerprints that are associated with the media processing data block.
[000180] Modality example numbered 5 is a method as described in modality example numbered 1, in which the media data state comprises a cryptographic random value encrypted with credential information, and in which the cryptographic random value must be authenticated by a receiving device.
[000181] Modality example numbered 6 is a method as described in modality example numbered 1, in which at least a portion of the media data state comprises one or more secure communication channels hidden in the media data and in which the one or more secure communication channels must be authenticated by the receiving device.
[000182] Modality example numbered 7 is a method as described in modality example numbered 6, in which the one or more secure communication channels comprise at least one spread-spectrum secure communication channel.
[000183] Modality example numbered 8 is a method as described in modality example numbered 6, in which the one or more secure communication channels comprise at least one frequency shift keying secure communication channel.
[000184] Modality example numbered 9 is a method as described in modality example numbered 1, in which the state of the media data is loaded with the output version of the media data into an output media bitstream.
[000185] Modality example numbered 10 is a method as described in modality example numbered 1, in which the state of the media data is loaded into an auxiliary metadata bitstream associated with a separate media bitstream that carries the output version of the media data.
[000186] Example of modality numbered 11 is a method as described in example of modality numbered 1, in which the state of the media data comprises one or more sets of parameters that relate to the type of media processing.
[000187] Example of modality numbered 12 is a method as described in example of modality numbered 1, in which at least one of the first device or the second device comprises one or more of: preprocessing units, encoders, processing subunits of media, transcoders, decoders, post-processing units or media content transformation sub-units.
[000188] Example of modality numbered 13 is a method as described in example of modality numbered 1, in which the first device is an encoder and in which the second device is a decoder.
[000189] Example of modality numbered 14 is a method as described in example of modality numbered 1, further comprising: performing by means of the first device, the type of media processing.
[000190] Mode example numbered 15 is a method as described in embodiment example numbered 1, in which the media processing type has been performed by device upstream with respect to the first device in the media processing chain; and further comprising receiving via the first device an input version of the media data, in which the input version of the media data comprises any state of the media data which indicates the type of media processing: analyzing the input version of the media data to determine the type of media processing that has already been performed on the input version of the media data.
[000191] Example of modality numbered 16 is a method as described in example of modality numbered 1, further comprising encoding sound volume and dynamic range values of media data state.
[000192] Example of embodiment numbered 17 is a method as described in example of embodiment numbered 1, in which the type of media processing was previously performed by an upstream device relative to the first device in the media processing chain; and further comprising receiving by the first device a command to clear the type of media processing previously performed; perform through the first device the type of media processing; communicate from the first device to the second device downstream in the media processing chain an output version of the media data and a state of the media data which indicates that the type of media processing has already been performed on the output version of the media data.
[000193] Example of modality numbered 18 is a method as described in example of modality numbered 17, further comprising receiving the command from one of: (a) a user input, (b) a system configuration of the first device, (c) signaling from a device external to the first device, or (d) signaling from a sub-unit within the first device.
[000194] Example of modality numbered 19 is a method as described in example of modality numbered 1, further comprising communicating from the first device to the second device downstream in the media processing chain one or more types of metadata independent of the state of the media data.
[000195] Modality example numbered 20 is a method as described in modality example numbered 1, in which the state of the media data comprises at least a portion of state metadata hidden in one or more secure communication channels.
[000196] Example of modality numbered 21 is a method as described in example of modality numbered 1, further comprising changing a plurality of bytes in the media data to store at least a portion of the state of the media data.
[000197] Example modality numbered 22 is a method as described in example modality numbered 1, in which at least one of the first device w of the second device comprises one or more of codecs (ATSC) Advanced Television Systems Committee, codecs (MPEG) Moving Picture Experts Group, codecs (AC-3) Audio Codec 3, and Enhanced AC-3 codecs.
[000198] Example of modality numbered 23 is a method as described in example of modality numbered 1, in which the media processing chain comprises: a pre-processing unit configured to accept time-domain samples comprising media content as input and output time-domain processed samples; an encoder configured to output the compressed media bitstream of the media content based on the processed time domain samples; a signal analysis and metadata correction unit configured to validate processing state metadata in the compressed media bitstream; a transcoder configured to modify the compressed media bitstream; a decoder configured to output time-domain decoded samples based on the compressed media bitstream, and a post-processing unit configured to perform post-processing of the media content on the time-domain decoded samples.
[000199] Example modality numbered 24 is a method as described in modality example numbered 23, in which at least one of the first device and the second device comprises one or more of the pre-processing unit, the signal analysis unit and metadata correction, the transcoder, the decoder and the post-processing unit.
[000200] Modality example numbered 25 is a method as described in modality example numbered 23, in which at least one of the pre-processing unit, the signal analysis and metadata correction unit, the transcoder, the decoder and the post-processing unit performs adaptive processing of media content based on processing metadata received from an upstream device.
[000201] Example modality numbered 26 is a method as described in example modality numbered 1, further comprising determining one or more media characteristics from the media data; include a description of the one with more media characteristics in the media data state.
[000202] Modality example numbered 27 is a method as described in modality example numbered 26, in which one or more media features comprises at least one media feature determined from one or more frames, seconds, minutes, intervals User-definable time slots, scenes, songs, pieces of music and recordings.
[000203] Example embodiment numbered 28 is a method as described in example embodiment numbered 26, in which one or more media features comprises a semantic description of the media data.
[000204] Modality example numbered 29 is a method as described in modality example numbered 26, in which one or more media characteristics comprises one or more of structural properties, tonality including harmony and melody, timbre, rhythm, volume, stereo mix, a number of sound sources from the media data, absence or presence of voice, repetition and melody characteristics, harmonies, lyrics, timbre and perceptible characteristics digital media characteristics, stereo parameters, one or more portions of speech content .
[000205] Example modality numbered 30 is a method as described in example modality numbered 26, further comprising using one or more media characteristics to classify the media data into one or more classes of media data into a plurality of classes of media. media data.
[000206] Example modality numbered 31 is a method as described in example modality numbered 30, in which one or more classes of media data comprises one or more of a single global/dominant media data class for the entire piece of media or a single class that represents a period of time less than the entire piece of media. Modality example numbered 32 is a method as described in modality example numbered 31, in which the smallest time period represents one or more of a single media frame, a single block of media data, multiple media frames, multiple blocks of media data, a fraction of a second, a second, or several seconds.
[000207] Modality example numbered 33 is a method as described in modality example numbered 30, in which one or more media data class labels representing one or more media data classes are computed and entered into a stream of bits.
[000208] Modality example numbered 34 is a method as described in modality example numbered 30, in which the one or more media data class labels representing one or more media data classes are computed and signaled to a node of receiving media processing as hidden data embedded with the media data.
[000209] Modality example numbered 35 is a method as described in modality example numbered 30, in which the one or more media data class labels representing one or more media data classes are computed and signaled to a node process of receiving media in a separate metadata structure between blocks of media data.
[000210] Modality example numbered 36 is a method as described in modality example numbered 31, in which the single global/dominant media data class represents one or more of a single class type such as music, speech, noise, silence, applause, or a mix of class type, such as talk over music, conversation over noise, or other mix of media data types.
[000211] Modality example numbered 37 is a method as described in modality example numbered 30, further comprising associating one or more probability or probability values with one or more media data class labels, in which a possibility or probability represents the level of confidence that the computed media class label has with respect to a segment/block of media to which the computed media class label is associated.
[000212] Modality example numbered 38 is a method as described in modality example numbered 37, in which the possibility or probability value is used by a receiving media processing node in the media processing chain, to adapt processing in a way to improve one or more operations such as upmixing, encoding, decoding, transcoding or headphone virtualization.
[000213] Modality example numbered 39 is a method as described in modality example numbered 38, in which at least one of the one or more operations eliminates a need to preset processing parameters, reduces complexity of processing units through the chain or extends battery life when complex parsing operations to sort media data through the receiving media processing node are avoided.
[000214] Example modality numbered 40 is a method comprising: determining by means of a first device in a media processing chain whether a type of media processing has already been performed on an output version of media data; and in response to determining, by means of the first device that the media processing type has already been performed on the input version of the media data, perform media data adaptation process to disable performing the media processing type on the first device; and in which the method is performed by means of one or more computing processors.
[000215] Example modality numbered 41 is a method as described in example modality numbered 40, further comprising communicating from the first device to a second device downstream in the media processing chain, an output version of the media data and a media data state that indicates the type of media processing has been performed on the output version of the media data.
[000216] Example of modality numbered 42 is a method as described in example of modality numbered 41, further comprising encoding sound volume and dynamic range values of media data state.
[000217] Example of modality numbered 43 is a method as described in example of modality numbered 40, further comprising performing by means of the first device, a second type of media processing on the media data, the second type of different media processing the type of media processing; communicate, from the first device to a second device downstream in the media processing chain, an output version of the media data and a state of the media data that indicates the media processing type and the second processing type have already been performed in the output version of the media data.
[000218] Example of modality numbered 44 is a method as described in example of modality numbered 40, further comprising: automatically performing one or more of adapting corrective sound volume or dynamic audio processing based at least in part if the type Processing was previously performed on the output version of the media data.
[000219] Modality example numbered 45 is a method as described in modality example numbered 40, further comprising extracting an input state of media data from data units in the media data encoding media content, in which the Input state of media data is hidden in one or more of the data drives.
[000220] Example of modality numbered 46 is a method as described in example of modality numbered 45, further comprising retrieving a version of the data units that do not understand the input state of the media data and transforming the media content based on the version of the data units that have been recovered.
[000221] Example of modality numbered 47 is a method as described in example of modality numbered 46, further comprising retrieving an input state of the media data that is associated with the output version of the media data.
[000222] Example of modality numbered 48 is a method as described in example of modality numbered 47, further comprising authenticating the input state of the media data by validating a cryptographic random value associated with the input state of the media data.
[000223] Example of modality numbered 49 is a method as described in example of modality numbered 47, further comprising authenticating the input state of the media data by validating one or more fingerprints associated with the input state of the media data, in which the At least one of the one or more fingerprints is generated based on at least a portion of the media data.
[000224] Example modality numbered 50 is a method as described in modality example numbered 47, further comprising validating the media data by validating one or more fingerprints associated with the input state of the media data, in which at least one of the or more fingerprints is generated based on at least a portion of the media data.
[000225] Modality example numbered 51 is a method as described in modality example numbered 47, in which the input state of the media data is loaded with the input version of the media data in an input media bitstream. .
[000226] Example of modality numbered 52 is a method as described in example of modality numbered 47, further comprising turning off one or more types of media processing based on the input state of the media data.
[000227] Modality example numbered 53 is a method as described in modality example numbered 47, in which the input state of media data is described with processing state metadata; and further comprising: creating media processing signaling based at least in part on processing state metadata, wherein the media processing signaling indicates the input state of the media data; transmitting the media processing signal to a downstream media processing device to the first device in the media processing chain.
[000228] Mode example numbered 54 is a method as described in embodiment example numbered 53, in which media processing signaling is hidden in one or more data units in an output version of the media data.
[000229] Example modality numbered 55 is a method as described in example modality numbered 54, in which media processing signaling is performed using a reversible data concealment technique, such that one or more modifications to the data media are removable through a receiving device.
[000230] Example modality numbered 56 is a method as described in example modality numbered 54, in which the media processing signaling is performed using an irreversible data concealment technique such that at least one of the one or more modifications on the media data is not removable by a receiving device.
[000231] Modality example numbered 57 is a method as described in modality example numbered 46, further comprising receiving from an upstream device in the media processing chain one or more types of metadata independent of any past media processing performed in the media data.
[000232] Modality example numbered 58 is a method as described in modality example numbered 47, in which the state of the media data comprises at least a portion of state metadata hidden in one or more secure communication channels.
[000233] Modality example numbered 59 is a method as described in modality example numbered 46, further comprising changing a plurality of bytes in the media data to store at least a portion of a state of the media data.
[000234] Example modality numbered 60 is a method as described in example modality numbered 46, in which the first device comprises one or more of Advanced Television Systems Committee (ATSC) codecs, Moving Picture Experts Group (MPEG) codecs ( AC-3) Audio Codec 3, and Enhanced AC-3 codecs.
[000235] Example modality numbered 61 is a method as described in example modality numbered 46, the media processing chain comprises: a pre-processing unit configured to accept time-domain samples comprising media content as input and to output time-domain processed samples; an encoder configured to output the compressed media bitstream of the media content based on the processed time domain samples; a signal analysis and metadata correction unit configured to validate processing state metadata in the compressed media bitstream; a transcoder configured to modify the compressed media bitstream; a decoder configured to output time-domain decoded samples based on the compressed media bitstream, and a post-processing unit configured to perform post-processing of the media content on the time-domain decoded samples.
[000236] Example modality numbered 62 is a method as described in example modality numbered 61, in which the first device comprises one or more of the pre-processing unit, the signal analysis and metadata correction unit, the transcoder, the decoder, and the post-processing unit.
[000237] Modality example numbered 63 is a method as described in modality example numbered 61 in which at least one of the pre-processing unit, the signal analysis and metadata correction unit, the transcoder, the decoder and the unit Post-processing performs adaptive processing of media content based on processing metadata received from an upstream device.
[000238] Example embodiment numbered 64 is a method as described in example embodiment numbered 47, further comprising determining one or more media features based on a description of the one or more media features in the media data state.
[000239] Modality example numbered 65 is a method as described in modality example numbered 64, in which one or more media features comprises at least one media feature determined from one or more frames, seconds, minutes, intervals User-definable time slots, scenes, songs, pieces of music and recordings.
[000240] Example embodiment numbered 66 is a method as described in example embodiment numbered 64, in which one or more media features comprises a semantic description of the media data.
[000241] Example modality numbered 67 is a method as described in example modality numbered 64 further comprising performing one or more specific operations in response to determining one or more media characteristics.
[000242] Modality example numbered 68 is a method as described in modality example numbered 64, further comprising providing to the second device in the media processing chain the state of the media data as one or more of: (a) digital from media; (b) processing state metadata, (c) extracted media feature values, (d) descriptions and/or values of media class types or subtypes, (e) feature class and/or feature subclass probability values of media, (f) cryptographic random value, or (f) signaling media processing.
[000243] Example modality numbered 69 is a method comprising: computing with a first device in a media processing chain, one or more than reduced data rate representations of a media data source frame; and loading one or more reduced data rate representations simultaneously and securely within a state of the media data itself to a second device in the media processing chain; in which the method is performed by one or more computing devices.
[000244] Modality example numbered 70 is a method as described in modality example numbered 69, in which one or more reduced data rate representations are loaded into at least one of a substream, one or more reserved fields, an add_bsi field , one or more auxiliary data fields, or one or more transformation coefficients.
[000245] Modality example numbered 71 is a method as described in modality example numbered 69, in which one or more reduced data rate representations comprises synchronization data used to synchronize audio and video distributed within one of the media data.
[000246] Modality example numbered 72 is a method as described in modality example numbered 69, in which one or more reduced data rate representations comprises media fingerprints (a) generated by a media processing unit and (b) embedded with the media data for one or more quality monitoring, media ratings, media tracking, or content search.
[000247] Modality example numbered 73 is a method as described in modality example numbered 69, in which at least one of the one or more reduced data rate representations comprises at least a portion of state metadata hidden in one or more channels communication tools.
[000248] Example modality numbered 74 is a method as described in example modality numbered 69, further comprising changing a plurality of bytes in the media data to store at least a portion of one of the one or more reduced data rate representations.
[000249] Example modality numbered 75 is a method as described in example modality numbered 69, in which at least one of the first device and the second device comprises one or more of Advanced Television Systems Committee (ATSC) codecs, (MPEG) codecs Moving Picture Experts Group, codecs (AC-3) Audio Codec 3, and Enhanced AC-3 codecs.
[000250] Example modality numbered 76 is a method as described in example modality numbered 69, in which the media processing chain comprises a pre-processing unit configured to accept time-domain samples comprising the media content as input and output time-domain processed samples; an encoder configured to output the compressed media bitstream of the media content based on the processed time domain samples; a signal analysis and metadata correction unit configured to be configured to validate processing state metadata in the compressed media bitstream; a transcoder configured to modify the compressed media bitstream; a decoder configured to output time-domain decoded samples based on the compressed media bitstream; and a post-processing unit configured to perform post-processing of media content on time-domain encoded samples.
[000251] Example modality numbered 77 is a method as described in example modality numbered 76, in which at least one of the first device and the second device comprises one or more of the pre-processing unit, the signal analysis unit and correction of metadata, transcoder, decoder, and post-processing unit.
[000252] Example of modality numbered 78 is a method as described in example of modality numbered 76, in which at least one of the pre-processing unit, the signal analysis and metadata correction unit, the transcoder, the decoder and the post-processing unit, performs adaptive processing of media content based on processing metadata received from an upstream device.
[000253] Example modality numbered 79 is a method as described in example modality numbered 69, further comprising providing to the second device, the state of the media data as one or more of: (a) digital media, (b) processing state metadata, (c) extracted media characteristic values, (d) descriptions and/or values of media class types or subtypes (e) media characteristic class and/or subclass probability values , (f) cryptographic random value, or (f) signaling media processing.
[000254] Example modality numbered 80 is a method comprising: adaptively processing with one or more computing devices in a media processing chain comprising one or more psycho-acoustic units, transformers, shape coding units wave/spatial audio, encoders, decoders, transcoders, current processors, a version of the media data input based on a past history of sound volume processing of the media data, by means of one or more processing units upstream media as indicated by a state of the media data; normalize sound volume and/or dynamic range of an output version of the media data at one end of the media processing chain to consistent sound volume and/or dynamic range values.
[000255] Modality example numbered 81 is a method as described in modality example numbered 80, in which the consistent sound volume value comprises a sound volume value of (1) controlled or selected by a user, or (2 ) adaptively signaled via a state in the input version of the media data.
[000256] Example of modality numbered 82 is a method as described in example of modality numbered 80, in which the sound volume value is computed over the dialog (speech) portions of the media data.
[000257] Modality example numbered 83 is a method as described in modality example numbered 80, in which the sound volume value is computed over the absolute, relative and/or non-blocking portions of the media data.
[000258] Modality example numbered 84 is a method as described in modality example numbered 90, in which the consistent dynamic range value comprises a dynamic range value of (1) controlled or selected by a user, or (2) flagged adaptively through a state in the input version of the media data.
[000259] Modality example numbered 85 is a method as described in modality example numbered 84, in which the dynamic range value is computed in the dialog (speech) portions of the media data.
[000260] Modality example numbered 86 is a method as described in modality example numbered 84, in which the dynamic range value is computed over absolute, relative, and/or non-blocking portions of the media data.
[000261] Modality example numbered 87 is a method as described in modality example numbered 80, further comprising: computing one or more sonic volume and/or dynamic range gain control values to normalize the output version of the data. media to a consistent sound volume value and consistent dynamic range and simultaneously load the one or more sound volume gain control and/or dynamic range values within a state of the output version of the media data at one end of the media processing chain, in which the one or more sound volume and/or dynamic range gain control values are usable by another device to reversely apply the one or more sound volume gain control values. sound volume and/or dynamic range to retrieve the sound volume value and an original dynamic range in the input version of the media data.
[000262] Modality example numbered 88 is a method as described in modality example numbered 87, in which the one or more of sound volume and/or dynamic range control values representing the state of the output version of the data media are loaded into at least one of a substream, one or more reserved fields, an add_bsi field, one or more auxiliary data fields, or one or more transformation coefficients.
[000263] Example modality numbered 89 is a method as described in modality example numbered 80, further comprising computing and transmitting, by means of at least one or more computing devices in the media processing chain, a cryptographic random value based on in the media data and/or media data state within one or more encoded bitstreams carrying the media data.
[000264] Example modality numbered 90 is a method as described in example modality numbered 89, further comprising authenticating by means of a receiving device the cryptographic random value; signaling via the receiving device to one or more media processing units a determination of whether the media data state is valid; signaling by means of the receiving device to one or more downstream media processing units the state of the media data in response to determining that the state of the media data is valid.
[000265] Modality example numbered 91 is a method as described in modality example numbered 89, in which the cryptographic random value representing the state of the media and/or media data is loaded into at least one of a substream, a or more reserved fields, an add_bsi field, one or more auxiliary data fields, or one or more transformation coefficients.
[000266] Modality example numbered 92 is a method as described in modality example numbered 80, in which the media data state comprises one or more of: (a) media digitals, (b) processing state metadata, (c) extracted media characteristic values, (d) descriptions and/or values of media class types or subtypes (e) media characteristic class and/or subclass probability values, (f) random value cryptographic or (f) signaling media processing.
[000267] Example modality numbered 93 is a method comprising performing one of inserting, extracting or editing related and unrelated media data locations and/or a state of related and unrelated media data locations within one or more bit streams encoded by means of one or more computing devices in a media processing chain comprising one or more of psycho-acoustic units, transformers, waveform/spatial audio encoding units, encoders, decoders, transcoders, or current processors.
[000268] Example modality numbered 94 is a method as described in example modality numbered 930, in which one or more locations of related or unrelated media data and/or the state of locations of related and unrelated media data within of coded bitstreams, are loaded into at least one of a substream, one or more reserved fields, an add_bsi field, one or more auxiliary data fields, or one or more transformation coefficients.
[000269] Example of modality numbered 95 is a method comprising performing one or more of inserting, extracting or editing related and unrelated media data and/or a state of related and unrelated media data within one or more streams of bits encoded, by means of one or more computing devices in a media processing chain comprising one or more of psycho-acoustic units, transformers, waveform/spatial audio encoding units, encoders, decoders, transcoders, or current processors.
[000270] Modality example numbered 96 is a method as described in modality example numbered 95, in which the one or more related and unrelated media data and/or the state of related and unrelated media data within streams of encoded bits are loaded into at least one of a substream, one or more reserved fields, an add_bsi field, one or more auxiliary data fields, or one or more transformation coefficients.
[000271] Example of modality numbered 97 is a method as described in example of modality numbered 93, further comprising providing from an upstream media processing device to a downstream media processing device a media data state as one or more of: (a) media digitals, (b) processing state metadata, (c) extracted media characteristic values, (d) descriptions and/or values of media class types or subtypes (e) media feature class and/or subclass probability values, (f) cryptographic random value, or (f) media processing signaling.
[000272] Example modality numbered 98 is a media processing system configured to compute and load cryptographic random values based on media data and/or a state of media data within one or more bit streams decoded by means of one or more computing devices in a media processing chain comprising one or more psycho-acoustic units, transformers, waveform/spatial audio encoding units, encoders, decoders, transcoders, or current processors.
[000273] Modality example numbered 99 is a media processing system as described in modality example numbered 98, in which the state of the media data comprises one or more of (a) media fingerprints, (b) state metadata of processing, (c) extracted media characteristic values, (d) descriptions and/or values of media class types or subtypes (e) media characteristic class and/or subclass probability values, (f ) cryptographic random value or (f) media processing signaling.
[000274] Example modality numbered 100 is a media processing system configured to adaptively process media data based on a state of media data received from one or more secure communication channels.
[000275] Example modality numbered 99 is a media processing system as described in modality example numbered 100, in which the media processing system comprises one or more processing nodes and in which the processing nodes comprise media delivery, media distribution systems and media transformation systems.
[000276] Example of modality numbered 102 is a media processing system as described in example of modality numbered 101, in which the one or more secure communication channels comprise at least one secure communication channel through two or more streams of encoded compressed bits and PCM processing nodes.
[000277] Modality example numbered 103 is a media processing system as described in modality example numbered 101, in which the one or more secure communication channels comprise at least one secure communication channel through two media processing devices separated.
[000278] Example of modality numbered 104 is a media processing system as described in example of modality numbered 101, in which the one or more secure communication channels comprise at least one secure communication channel through two media processing nodes in a single media processing device.
[000279] Modality example numbered 105 is a media processing system as described in modality example numbered 100, in which the media processing system is configured to perform autonomous media processing operations independent of how the media processing systems. media are ordered in a media processing chain of which the media processing system is a part.
[000280] Example modality numbered 106 is a media processing system as described in example modality numbered 100, in which the state of the media data comprises one or more of (a) media fingerprints, (b) state metadata of processing, (c) extracted media characteristic values, (d) descriptions and/or values of media class types or subtypes (e) media characteristic class and/or subclass probability values, (f ) cryptographic random value or (f) media processing signaling.
[000281] Modality example numbered 107 is a media processing system configured to perform any of the methods as described in modality examples numbered 1-99;
[000282] Mode example numbered 108 is an apparatus comprising a processor and configured to perform any of the methods as described in embodiment examples numbered 1-99.
[000283] Embodiment example numbered 109 is a computer readable storage medium comprising software instructions which when executed by one or more processors cause any of the methods as described in embodiment examples numbered 1-99 to be performed. 9. Equivalents, extensions, alternatives and miscellaneous
[000284] In the foregoing specification, possible embodiments of the invention have been described with reference to i number specific details which may vary from implementation to implementation. Thus, the sole and exclusive indicator of what the invention is, and is designed by applicants to be the invention, is the set of embodiments that arise from this application, in the specific form in which such embodiments arise, including any subsequent correction. Any definitions expressly described herein for terms contained in such embodiments shall govern the meaning of such terms as used in the embodiments. Hence, no limitation, element, property, characteristic, advantage or attribute that is not expressly described in an embodiment, should limit the scope of such an embodiment in any way. The specification and drawings are, therefore, to be noted in an illustrative rather than a restrictive sense.
权利要求:
Claims (14)
[0001]
1. Method characterized in that it comprises the steps of: determining (910), by means of a first device in a media processing chain, whether a type of media processing has been performed on an output version of media data ; in response to determining (910), through the first device, that the type of media processing was performed on an output version of the media data, perform: create, or modify, through the first device, a data state of media, the state specifying the type of media processing performed on the output version of the media data, digitally signaling, by the first device, the state of the media data with a cryptographic hash value; communicating, from the first device to a second device downstream in the media processing chain, the output version of the media data and the state of the media data; validate, by the second device, the state of the media data based on the cryptographic hash value; disable, by the second device, the execution of the type of media processing indicated by the media data state if the media data state is considered valid; and performing, by the second device, the type of media processing indicated by the state of the media data if the state of the media data is found to be invalid.
[0002]
2. Method according to claim 1, characterized in that it further comprises providing, for the second device, the state of the media data as one or more of: (a) media fingerprints; (b) processing state metadata; (c) extracted media characteristic values; (d) values and/or description(s) of media types or subtypes; (e) media feature class and/or subclass probability values, or (f) media processing signaling.
[0003]
3. Method according to claim 1, characterized in that at least a part of the media data state comprises one or more secure communication channels hidden in the media data, and in which the one or more communication channels Safes must be authenticated by a recipient device.
[0004]
4. Method according to claim 1, characterized in that the state of the media data is loaded: with the output version of the media data in an output media bitstream; or in an auxiliary metadata bitstream associated with a separate media bitstream that carries the output version of the media data.
[0005]
5. Method according to claim 1, characterized in that the state of the media data comprises one or more sets of parameters that relate to the type of media processing.
[0006]
6. Method according to claim 1, characterized in that it further comprises performing, through the first device, the type of media processing.
[0007]
7. Method according to claim 1, characterized in that the type of media processing was performed by an upstream device, relative to the first device, in the media processing chain; and further comprising: receiving, via the first device, an input version of the media data, wherein the input version of the media data comprises any state of the media data which indicates the type of media processing; and analyzing the input version of the media data to determine the type of media processing already performed on the input version of the media data.
[0008]
8. Method according to claim 1, characterized in that it further comprises: encoding intensity and dynamic range values in the state of the media data.
[0009]
9. Method according to claim 1, characterized in that the type of media processing was previously performed by an upstream device, in relation to the first device, in the media processing chain; and further comprising: receiving, by the first device, a command to override the type of media processing performed previously; perform, by the first device, the type of media processing; and communicate, from the first device to a second device downstream in the media processing chain, an output version of the media data and a state of the media data which indicates that the type of media processing has already been performed on the version. output media data.
[0010]
10. Method according to claim 1, characterized in that the state of the media data comprises at least a part of state metadata hidden in one or more secure communication channels, or further comprises changing a plurality of bytes in the media data to store at least a portion of the state of the media data.
[0011]
11. Method according to claim 1, characterized in that the media processing chain comprises: a pre-processing unit configured to accept samples in the time domain comprising media content as input and to produce processed samples in the time domain; an encoder configured to produce compressed media bitstream of the media content based on the processed samples in the time domain; a signal analysis and metadata correction unit configured to validate processing state metadata in the compressed media bitstream; a transcoder configured to modify the bitstream of the compressed media; a decoder configured to produce time-domain decoded samples based on the bitstream of the compressed media; and a post-processing unit configured to perform post-processing of media content on the time-domain decoded samples.
[0012]
12. Method according to claim 1, characterized in that it further comprises: determining one or more media characteristics from the media data; and include a description of the one or more media characteristics in the state of the media data.
[0013]
13. Apparatus characterized in that it comprises a processor and configured to perform the method as defined in any one of claims 1 to 12.
[0014]
14. Computer readable storage medium characterized in that it comprises the method as defined in any one of claims 1 to 12.
类似技术:
公开号 | 公开日 | 专利标题
US20210280200A1|2021-09-09|Adaptive processing with multiple media processing nodes
BR122020007952B1|2021-10-13|AUDIO DECODING METHOD AND AUDIO DECODING DEVICE
BR122020007965B1|2021-10-13|AUDIO DECODING METHOD AND AUDIO DECODING SYSTEM
JP7012786B2|2022-01-28|Adaptive processing by multiple media processing nodes
AU2020200861A1|2020-02-27|Adaptive Processing with Multiple Media Processing Nodes
同族专利:
公开号 | 公开日
SG190164A1|2013-07-31|
JP2018163379A|2018-10-18|
MX370087B|2019-11-29|
IL282781D0|2021-06-30|
CN112002335A|2020-11-27|
AR084086A1|2013-04-17|
IL226100D0|2013-06-27|
MX2013005898A|2013-06-18|
KR20130111601A|2013-10-10|
MX338238B|2016-04-08|
TW201735010A|2017-10-01|
JP2020170189A|2020-10-15|
WO2012075246A2|2012-06-07|
KR20210081443A|2021-07-01|
TW202103145A|2021-01-16|
TWI716169B|2021-01-11|
KR20200106991A|2020-09-15|
TW201236446A|2012-09-01|
KR20140106760A|2014-09-03|
CA2998405A1|2012-06-07|
IL276179A|2021-05-31|
TWI496461B|2015-08-11|
CN112002336A|2020-11-27|
EP2647006A1|2013-10-09|
CL2013001571A1|2013-10-11|
CN105845145B|2020-08-25|
KR101943295B1|2019-04-17|
JP6378703B2|2018-08-22|
US20210280200A1|2021-09-09|
BR112013013353A2|2016-09-13|
IL271502A|2020-08-31|
KR102269529B1|2021-06-29|
RU2695504C1|2019-07-23|
AU2011336566A1|2013-05-30|
CA2816889A1|2012-06-07|
CN112002334A|2020-11-27|
JP6581324B2|2019-09-25|
IL253183A|2020-01-30|
JP2020013143A|2020-01-23|
RU2568372C2|2015-11-20|
JP2016136256A|2016-07-28|
TW201543469A|2015-11-16|
TWI581250B|2017-05-01|
IL253183D0|2017-08-31|
US20180068666A1|2018-03-08|
JP6728456B2|2020-07-22|
KR20190127990A|2019-11-13|
RU2568372C9|2016-06-10|
MX2019014282A|2020-01-27|
RU2019118805A|2020-12-18|
CA2816889C|2018-05-01|
JP2014505898A|2014-03-06|
TW202139181A|2021-10-16|
KR20190009850A|2019-01-29|
CN103392204B|2016-05-11|
TWI687918B|2020-03-11|
RU2015132824A|2018-12-24|
CN103392204A|2013-11-13|
RU2013130293A|2015-01-10|
IL226100A|2017-07-31|
KR20180100257A|2018-09-07|
TW202032537A|2020-09-01|
IL271502D0|2020-02-27|
EP2647006B1|2019-09-18|
TW201928944A|2019-07-16|
MX359652B|2018-10-05|
EP3627508A1|2020-03-25|
JP6530542B2|2019-06-12|
US20130246077A1|2013-09-19|
KR20170113678A|2017-10-12|
KR102155491B1|2020-09-15|
JP2019152874A|2019-09-12|
IL276179D0|2020-09-30|
TWI733583B|2021-07-11|
CN105845145A|2016-08-10|
TWI665659B|2019-07-11|
CN111613233A|2020-09-01|
KR101438386B1|2014-09-05|
KR101787363B1|2017-11-15|
KR102043609B1|2019-11-12|
JP5879362B2|2016-03-08|
KR101895199B1|2018-09-07|
US9842596B2|2017-12-12|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

JP2947685B2|1992-12-17|1999-09-13|シャープ株式会社|Audio codec device|
US7711564B2|1995-07-27|2010-05-04|Digimarc Corporation|Connected audio and other media objects|
US7224819B2|1995-05-08|2007-05-29|Digimarc Corporation|Integrating digital watermarks in multimedia content|
US7006661B2|1995-07-27|2006-02-28|Digimarc Corp|Digital watermarking systems and methods|
US5949891A|1993-11-24|1999-09-07|Intel Corporation|Filtering audio signals from a combined microphone/speaker earpiece|
FI96650C|1994-07-11|1996-07-25|Nokia Telecommunications Oy|Method and apparatus for transmitting speech in a telecommunication system|
TW271524B|1994-08-05|1996-03-01|Qualcomm Inc|
EP1249002B1|2000-01-13|2011-03-16|Digimarc Corporation|Authenticating metadata and embedding metadata in watermarks of media signals|
US5903862A|1995-01-25|1999-05-11|Weaver, Jr.; Lindsay A.|Method and apparatus for detection of tandem vocoding to modify vocoder filtering|
US5794185A|1996-06-14|1998-08-11|Motorola, Inc.|Method and apparatus for speech coding using ensemble statistics|
ES2251743T3|1996-11-07|2006-05-01|Koninklijke Philips Electronics N.V.|DATA PROCESSING OF A BIT FLOW SIGNAL.|
CA2265089C|1998-03-10|2007-07-10|Sony Corporation|Transcoding system using encoding history information|
CN1065400C|1998-09-01|2001-05-02|国家科学技术委员会高技术研究发展中心|Compatible AC-3 and MPEG-2 audio-frequency code-decode device and its computing method|
US7055034B1|1998-09-25|2006-05-30|Digimarc Corporation|Method and apparatus for robust embedded data|
KR100746018B1|1999-03-10|2007-08-06|디지맥 코포레이션|Signal processing methods, devices, and applications for digital rights management|
US6807632B1|1999-01-21|2004-10-19|Emc Corporation|Content addressable information encapsulation, representation, and transfer|
US20020032502A1|2000-05-05|2002-03-14|Russell J. Chris|Integrated media production security method and system|
US7206775B2|2000-07-06|2007-04-17|Microsoft Corporation|System and methods for the automatic transmission of new, high affinity media|
US6990453B2|2000-07-31|2006-01-24|Landmark Digital Services Llc|System and methods for recognizing sound and music signals in high noise and distortion|
US7853664B1|2000-07-31|2010-12-14|Landmark Digital Services Llc|Method and system for purchasing pre-recorded music|
US6983466B2|2000-12-06|2006-01-03|Microsoft Corporation|Multimedia project processing systems and multimedia project processing matrix systems|
KR100587517B1|2001-11-14|2006-06-08|마쯔시다덴기산교 가부시키가이샤|Audio coding and decoding|
EP1318611A1|2001-12-06|2003-06-11|Deutsche Thomson-Brandt Gmbh|Method for retrieving a sensitive criterion for quantized spectra detection|
WO2003049441A1|2001-12-07|2003-06-12|Matsushita Electric Industrial Co., Ltd.|Media contents distribution system and method|
AUPR960601A0|2001-12-18|2002-01-24|Canon Kabushiki Kaisha|Image protection|
EP1499949A4|2002-04-26|2008-07-02|Korea Electronics Telecomm|Apparatus and method for adapting audio signal|
US7454331B2|2002-08-30|2008-11-18|Dolby Laboratories Licensing Corporation|Controlling loudness of speech in signals that contain speech and other types of audio material|
MXPA05005602A|2002-11-28|2005-07-26|Koninkl Philips Electronics Nv|Coding an audio signal.|
US7444336B2|2002-12-11|2008-10-28|Broadcom Corporation|Portable media processing unit in a media exchange network|
JP2006524358A|2003-04-08|2006-10-26|コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ|Vulnerable audio watermarks related to embedded data channels|
US20040267778A1|2003-06-27|2004-12-30|Microsoft Corporation|Media foundation topology application programming interface|
US7509255B2|2003-10-03|2009-03-24|Victor Company Of Japan, Limited|Apparatuses for adaptively controlling processing of speech signal and adaptively communicating speech in accordance with conditions of transmitting apparatus side and radio wave and methods thereof|
US7516232B2|2003-10-10|2009-04-07|Microsoft Corporation|Media organization for distributed sending of media data|
US7315822B2|2003-10-20|2008-01-01|Microsoft Corp.|System and method for a media codec employing a reversible transform obtained via matrix lifting|
US7412380B1|2003-12-17|2008-08-12|Creative Technology Ltd.|Ambience extraction and modification for enhancement and upmix of audio signals|
US7653265B2|2004-01-16|2010-01-26|Nvidia Corporation|Video image processing with utility processing stage|
DE602005022641D1|2004-03-01|2010-09-09|Dolby Lab Licensing Corp|Multi-channel audio decoding|
US8688248B2|2004-04-19|2014-04-01|Shazam Investments Limited|Method and system for content sampling and identification|
US7617109B2|2004-07-01|2009-11-10|Dolby Laboratories Licensing Corporation|Method for correcting metadata affecting the playback loudness and dynamic range of audio information|
US8135136B2|2004-09-06|2012-03-13|Koninklijke Philips Electronics N.V.|Audio signal enhancement|
US8150937B2|2004-10-25|2012-04-03|Apple Inc.|Wireless synchronization between media player and host device|
JP5101292B2|2004-10-26|2012-12-19|ドルビーラボラトリーズライセンシングコーポレイション|Calculation and adjustment of audio signal's perceived volume and / or perceived spectral balance|
CN101099196A|2005-01-04|2008-01-02|皇家飞利浦电子股份有限公司|An apparatus for and a method of processing reproducible data|
EP1851866B1|2005-02-23|2011-08-17|Telefonaktiebolaget LM Ericsson |Adaptive bit allocation for multi-channel audio encoding|
TW200638335A|2005-04-13|2006-11-01|Dolby Lab Licensing Corp|Audio metadata verification|
US20060259781A1|2005-04-29|2006-11-16|Sony Corporation/Sony Electronics Inc.|Method and apparatus for detecting the falsification of metadata|
JP2009504026A|2005-07-27|2009-01-29|ダグカーソンアンドアソシエーツ,インク.|Verification history data associated with digital content|
US8280944B2|2005-10-20|2012-10-02|The Trustees Of Columbia University In The City Of New York|Methods, media and systems for managing a distributed application running in a plurality of digital processing devices|
KR100803206B1|2005-11-11|2008-02-14|삼성전자주식회사|Apparatus and method for generating audio fingerprint and searching audio data|
US7877752B2|2005-12-14|2011-01-25|Broadcom Corp.|Method and system for efficient audio scheduling for dual-decode digital signal processor |
US20070168197A1|2006-01-18|2007-07-19|Nokia Corporation|Audio coding|
WO2007110823A1|2006-03-29|2007-10-04|Koninklijke Philips Electronics N.V.|Audio decoding|
CN101421781A|2006-04-04|2009-04-29|杜比实验室特许公司|Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal|
EP1852848A1|2006-05-05|2007-11-07|Deutsche Thomson-Brandt GmbH|Method and apparatus for lossless encoding of a source signal using a lossy encoded data stream and a lossless extension data stream|
US20080007649A1|2006-06-23|2008-01-10|Broadcom Corporation, A California Corporation|Adaptive video processing using sub-frame metadata|
US20070299657A1|2006-06-21|2007-12-27|Kang George S|Method and apparatus for monitoring multichannel voice transmissions|
KR100781528B1|2006-06-29|2007-12-03|삼성전자주식회사|Device and method for providing video stream with integrity|
US7940989B2|2006-06-30|2011-05-10|New Jersey Institute Of Technology|Apparatus and method for a generalized benford's law analysis of DCT and JPEG coefficients|
MX2008016163A|2006-06-30|2009-02-04|Fraunhofer Ges Forschung|Audio encoder, audio decoder and audio processor having a dynamically variable harping characteristic.|
US8885804B2|2006-07-28|2014-11-11|Unify Gmbh & Co. Kg|Method for carrying out an audio conference, audio conference device, and method for switching between encoders|
US7725311B2|2006-09-28|2010-05-25|Ericsson Ab|Method and apparatus for rate reduction of coded voice traffic|
US8521314B2|2006-11-01|2013-08-27|Dolby Laboratories Licensing Corporation|Hierarchical control path with constraints for audio dynamics processing|
JP2009032070A|2007-07-27|2009-02-12|Hitachi Software Eng Co Ltd|Authentication system and authentication method|
US9866785B2|2007-08-15|2018-01-09|Advanced Micro Devices, Inc.|Automatic reduction of video display device power consumption|
KR101569032B1|2007-09-06|2015-11-13|엘지전자 주식회사|A method and an apparatus of decoding an audio signal|
US20090079842A1|2007-09-26|2009-03-26|Honeywell International, Inc.|System and method for image processing|
JP2011507416A|2007-12-20|2011-03-03|エーティーアイ・テクノロジーズ・ユーエルシー|Method, apparatus and machine-readable recording medium for describing video processing|
CN101527829B|2008-03-07|2011-07-20|华为技术有限公司|Method and device for processing video data|
JP4596044B2|2008-06-03|2010-12-08|ソニー株式会社|Information processing system and information processing method|
US8793498B2|2008-08-11|2014-07-29|Nbcuniversal Media, Llc|System and method for forensic analysis of media works|
EP3217395A1|2008-10-29|2017-09-13|Dolby International AB|Signal clipping protection using pre-existing audio gain metadata|
US8429287B2|2009-04-29|2013-04-23|Rangecast Technologies, Llc|Network audio distribution system and method|
US8489774B2|2009-05-27|2013-07-16|Spot411 Technologies, Inc.|Synchronized delivery of interactive content|
US20120096353A1|2009-06-19|2012-04-19|Dolby Laboratories Licensing Corporation|User-specific features for an upgradeable media kernel and engine|
JP4894896B2|2009-07-29|2012-03-14|株式会社Jvcケンウッド|Audio signal processing apparatus and audio signal processing method|
US8594392B2|2009-11-18|2013-11-26|Yahoo! Inc.|Media identification system for efficient matching of media items having common content|
CN102131023A|2010-01-12|2011-07-20|株式会社东芝|Image forming apparatus and image formation processing method|
EP2360681A1|2010-01-15|2011-08-24|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for extracting a direct/ambience signal from a downmix signal and spatial parametric information|
TWI443646B|2010-02-18|2014-07-01|Dolby Lab Licensing Corp|Audio decoder and decoding method using efficient downmixing|
JP2011186187A|2010-03-09|2011-09-22|Jvc Kenwood Holdings Inc|Speech processor, speech processing method and speech processing program|
US8812498B2|2010-09-28|2014-08-19|Apple Inc.|Methods and systems for providing podcast content|
TWI496461B|2010-12-03|2015-08-11|Dolby Lab Licensing Corp|Adaptive processing with multiple media processing nodes|TWI496461B|2010-12-03|2015-08-11|Dolby Lab Licensing Corp|Adaptive processing with multiple media processing nodes|
US9311923B2|2011-05-19|2016-04-12|Dolby Laboratories Licensing Corporation|Adaptive audio processing based on forensic detection of media processing history|
JP5416253B2|2012-06-27|2014-02-12|株式会社Nttドコモ|Related content search apparatus and related content search method|
ES2638391T3|2012-08-10|2017-10-20|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Encoder, decoder, system and procedure that employs a residual concept for parametric coding of an audio object|
EP2717510B1|2012-10-08|2015-05-13|Université de Genève|Method for active content fingerprinting|
US9411881B2|2012-11-13|2016-08-09|Dolby International Ab|System and method for high dynamic range audio distribution|
ES2629195T3|2013-01-21|2017-08-07|Dolby Laboratories Licensing Corporation|Encoding and decoding of a bit sequence according to a confidence level|
SG11201502405RA|2013-01-21|2015-04-29|Dolby Lab Licensing Corp|Audio encoder and decoder with program loudness and boundary metadata|
JP6129348B2|2013-01-21|2017-05-17|ドルビー ラボラトリーズ ライセンシング コーポレイション|Optimization of loudness and dynamic range across different playback devices|
CN203134365U|2013-01-21|2013-08-14|杜比实验室特许公司|Audio frequency decoder for audio processing by using loudness processing state metadata|
WO2014124377A2|2013-02-11|2014-08-14|Dolby Laboratories Licensing Corporation|Audio bitstreams with supplementary data and encoding and decoding of such bitstreams|
CN107093991B|2013-03-26|2020-10-09|杜比实验室特许公司|Loudness normalization method and equipment based on target loudness|
CN104078050A|2013-03-26|2014-10-01|杜比实验室特许公司|Device and method for audio classification and audio processing|
TWI530941B|2013-04-03|2016-04-21|杜比實驗室特許公司|Methods and systems for interactive rendering of object based audio|
TWM487509U|2013-06-19|2014-10-01|杜比實驗室特許公司|Audio processing apparatus and electrical device|
EP2830050A1|2013-07-22|2015-01-28|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for enhanced spatial audio object coding|
EP2830045A1|2013-07-22|2015-01-28|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Concept for audio encoding and decoding for audio channels and audio objects|
EP2830047A1|2013-07-22|2015-01-28|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for low delay object metadata coding|
EP3044876B1|2013-09-12|2019-04-10|Dolby Laboratories Licensing Corporation|Dynamic range control for a wide variety of playback environments|
US9349378B2|2013-11-19|2016-05-24|Dolby Laboratories Licensing Corporation|Haptic signal synthesis and transport in a bit stream|
US9621963B2|2014-01-28|2017-04-11|Dolby Laboratories Licensing Corporation|Enabling delivery and synchronization of auxiliary content associated with multimedia data using essence-and-version identifier|
US10021436B2|2014-10-14|2018-07-10|Disney Enterprises, Inc.|Storage of tags in video for carriage in real-time|
US9414076B2|2014-11-03|2016-08-09|Broadcom Corporation|System architecture for distributed coding|
US20160239508A1|2015-02-12|2016-08-18|Harman International Industries, Incorporated|Media content playback system and method|
CN106454384B|2015-08-04|2019-06-25|中国科学院深圳先进技术研究院|Video frame insertion and frame deletion detection method|
EP3369175A4|2015-10-28|2019-05-08|DTS, Inc.|Object-based audio signal balancing|
US9372881B1|2015-12-29|2016-06-21|International Business Machines Corporation|System for identifying a correspondence between a COBOL copybook or PL/1 include file and a VSAM or sequential dataset|
US10256918B2|2016-03-04|2019-04-09|Leidos, Inc.|System and method for implementing adaptive pulse position modulationfor improved optical communications performance|
CN105828272A|2016-04-28|2016-08-03|乐视控股(北京)有限公司|Audio signal processing method and apparatus|
US10015612B2|2016-05-25|2018-07-03|Dolby Laboratories Licensing Corporation|Measurement, verification and correction of time alignment of multiple audio channels and associated metadata|
JP2018191269A|2017-02-24|2018-11-29|トムソン ライセンシングThomson Licensing|Method and device of reconstructing image data from decoded image data|
US10354660B2|2017-04-28|2019-07-16|Cisco Technology, Inc.|Audio frame labeling to achieve unequal error protection for audio frames of unequal importance|
US10877735B1|2017-09-25|2020-12-29|Amazon Technologies, Inc.|Automated generation of software applications using analysis of submitted content items|
TWI639997B|2017-09-28|2018-11-01|大仁科技大學|Dialog understanding method based on probabilistic rule|
US10733374B1|2019-02-14|2020-08-04|Gideon Samid|Live documentation |
US10826606B1|2018-08-14|2020-11-03|Leidos, Inc.|Quantum detection and tracking of pulsed optical signals|
WO2020123424A1|2018-12-13|2020-06-18|Dolby Laboratories Licensing Corporation|Dual-ended media intelligence|
WO2021030515A1|2019-08-15|2021-02-18|Dolby International Ab|Methods and devices for generation and processing of modified audio bitstreams|
WO2021226342A1|2020-05-06|2021-11-11|Dolby Laboratories Licensing Corporation|Audio watermark to indicate post-processing|
CN112995425B|2021-05-13|2021-09-07|北京百瑞互联技术有限公司|Equal loudness sound mixing method and device|
法律状态:
2018-12-18| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|
2020-01-28| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2020-11-03| B06A| Patent application procedure suspended [chapter 6.1 patent gazette]|
2021-03-23| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2021-05-04| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 01/12/2011, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
申请号 | 申请日 | 专利标题
US41974710P| true| 2010-12-03|2010-12-03|
US61/419,747|2010-12-03|
US201161558286P| true| 2011-11-10|2011-11-10|
US61/558,286|2011-11-10|
PCT/US2011/062828|WO2012075246A2|2010-12-03|2011-12-01|Adaptive processing with multiple media processing nodes|
[返回顶部]